#data-science-and-ml | Python | Page 402

distant horizon May 12, 2022, 2:25 PM

#

Do it

misty flint May 12, 2022, 3:32 PM

#

praise

shrewd saddle May 12, 2022, 3:34 PM

#

Is there any reason that tanh could be causing some overfitting compared to ReLU in a CNN? I am trying out a CNN model and the training accuracy for using tanh is always more than ReLU, even though the test accuracy is more or less same

iron rampart May 12, 2022, 3:51 PM

#

Does someone know what kind of data this is from keras? keras.datasets.fashion_mnist

tidal bough May 12, 2022, 3:58 PM

#

iron rampart Does someone know what kind of data this is from keras? ``keras.datasets.fashion...

https://www.tensorflow.org/api_docs/python/tf/keras/datasets/fashion_mnist/load_data

This is a dataset of 60,000 28x28 grayscale images of 10 fashion categories, along with a test set of 10,000 images. This dataset can be used as a drop-in replacement for MNIST.

TensorFlow

tf.keras.datasets.fashion_mnist.load_data | TensorFlow Core v2.8.0

Loads the Fashion-MNIST dataset.

#

or similar docs on keras's site: https://keras.io/api/datasets/fashion_mnist/

Keras documentation: Fashion MNIST dataset, an alternative to MNIST

celest vine May 12, 2022, 4:31 PM

#

Hi

#

Is anyone here?

vagrant trench May 12, 2022, 4:32 PM

#

yes

celest vine May 12, 2022, 4:32 PM

#

I needed help with something

#

I have a dataset that contains followers data of a Twitter account.

#

The data contains username and profile pic URL.

#

But I want the actual profile pic in JPEG format

#

How to get that

#

I have around 300k users data.

#

That means 300k profile pic URLs

serene scaffold May 12, 2022, 4:51 PM

#

@celest vine if the Twitter API gives you the URL for the profile picture, you can probably use requests to download the jpeg. but make sure you're not attempting to download the profile picture for the same person more than once, so you don't add needless load to their servers

#

this page talks about how to pick a version of a profile picture of a given size. I would download the smallest one that is suitable for your purposes: https://developer.twitter.com/en/docs/twitter-api/v1/accounts-and-users/user-profile-images-and-banners

User profile images and banners

celest vine May 12, 2022, 4:54 PM

#

serene scaffold <@968174073647599617> if the Twitter API gives you the URL for the profile pictu...

How much time will it take to download 300k profile pics using requests?

serene scaffold May 12, 2022, 4:55 PM

#

celest vine How much time will it take to download 300k profile pics using requests?

no idea

#

it's going to depend on the speed of your network and of their network, and everything in between.

#

and possibly also rate limits.

tidal bough May 12, 2022, 4:56 PM

#

You can experiment by downloading the few hundred first ones (using something like aiohttp, ideally) and seeing how long that takes. Extrapolate on the full 300k and decide if it's worth it.

#

ratelimits might be the biggest problem though

#

I wonder if it's possible to, without downloading a file, request its hash or something like that, to avoid redownloading equal files.

serene scaffold May 12, 2022, 4:58 PM

#

I had a similar thought. it's weird that the twitter API exposes the pfp URL but doesn't have an official way to download it. the idea of downloading them all "manually" seems a bit questionable from a TOS standpoint

celest vine May 12, 2022, 5:01 PM

#

I already have the pfp URLs though.

misty flint May 12, 2022, 5:16 PM

#

api rate limiting is def a big hurdle

#

kekHands

serene scaffold May 12, 2022, 5:23 PM

#

@celest vine what are you going to do with these images once you have them?

#

twitter PFPs could be of almost anything, so I'm not sure what one would do with a big dump of them

brazen totem May 12, 2022, 5:32 PM

#

is it good to use SMOTE on a pretty even dataset

vagrant trench May 12, 2022, 5:40 PM

#

guys please who work with kmeans ?

iron rampart May 12, 2022, 5:53 PM

#

tidal bough https://www.tensorflow.org/api_docs/python/tf/keras/datasets/fashion_mnist/load_...

I'm new to AI and all. And i've trained an Neural Network on it. how could i expand this into real-world images?

misty flint May 12, 2022, 6:07 PM

#

hmm

#

need to try some model stuff on aws

#

guess i should just expect unexpected cloud costs

#

kekHands

glossy mist May 12, 2022, 6:12 PM

#

Hey guys, sorry I am new to python, recently I want to read this table in that is generated from Adobe Premier Pro. I use pandas to read it but it gives a very weird output. Do you know why it is this case?

serene scaffold May 12, 2022, 6:25 PM

#

glossy mist Hey guys, sorry I am new to python, recently I want to read this table in that i...

please show the code you used to read it as text

#

!code

arctic wedgeBOT May 12, 2022, 6:25 PM

#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

glossy mist May 12, 2022, 6:34 PM

#

serene scaffold please show the code you used to read it as text

no worries! I just changed the encoding with the pandas function and now it works

glossy mist May 12, 2022, 6:34 PM

#

serene scaffold please show the code you used to read it as text

Thanks though

sullen hazel May 12, 2022, 7:13 PM

#

Guys, any way out there to use decision tree with n-bit values?

burnt iron May 12, 2022, 8:17 PM

#

Hello

#

I am facing a problem

#

actually I'm struggling to find a scalable solution to a problem

#

So here it is:

#

I want to merge two csv-files with soccer data. They hold different data of the same and different games (partial overlap). Normally I would do a merge with df.merge, but the problem is, that the nomenclature differs for some teams in the two Datasets. E.g. "Atletic Bilbao" is called "Club Atletic" in the second set.

#

So the main question is: How could I automatically analyse the differences in the two datasets naming?

#

so dataset 1 looks something like this:

#

Atletic Bilbao   Leicester   2022-05-20 22:00:00 0.2812

#

and dataset two has the same structure but atletic bilbao is called club atletic:

#

hometeam            awayteam    date  
Club Atletic   Leicester   2022-05-20 22:00:00 0.2812

#

there are a few more columns that I want to do analysis on but basically this is what the merge is going to operate on

frigid elk May 12, 2022, 8:31 PM

#

is it not an option to fix the data in one of them? then concat the dataframes together?

burnt iron May 12, 2022, 8:33 PM

#

I mean yes that could work

#

but what if I had more than 2 dataframes

frigid elk May 12, 2022, 8:35 PM

#

you could create a table of master_id mappings to hometeam names, .. then join your dataframes together using master_id.

burnt iron May 12, 2022, 8:36 PM

#

hmm yeah I get what you mean but with .concat

#

I don't have the freedom to choose what algorithm I use for comparing the strings

#

Or dont I?

frigid elk May 12, 2022, 8:40 PM

#

no algorithm needed, just replace all team names (hometeam/awayteam) with the master_id (master_team_name).

#

you could join all dataframes, then group by date and pull any rows that have more than one record to validate your results. ... that would give you a list of team names you'll need to fix

burnt iron May 12, 2022, 8:43 PM

#

ok thanks I'll look into that

odd meteor May 12, 2022, 9:26 PM

#

burnt iron `I want to merge two csv-files with soccer data. They hold different data of the...

Get the unique team names from the df where the team column has issues.
create a dictionary using team names from #1 as key and their respective correct team names as the value/item.
Call the map function on df in #1 and pass your dictionary to it to fix the problem.
Now you can merge your df1 and df2 on team names

burnt iron May 12, 2022, 9:27 PM

#

Yeah that sounds good too but again, what if I have more than 2 datasets?

#

I mean the problem's complexity grows exponentially then because for each dataframe I have to have mappings to the other data frames

odd meteor May 12, 2022, 9:31 PM

#

burnt iron Yeah that sounds good too but again, what if I have more than 2 datasets?

That's why you need to create the dictionary; it's reusable. . If you have more dfs, so long as the faulty team names from the new df is present as a key in already existing dictionary, you can reuse it. But if otherwise, you can simply update your dictionary and you're good to go.

burnt iron May 12, 2022, 9:36 PM

#

odd meteor That's why you need to create the dictionary; it's reusable. . If you have more ...

Yeah. So that means that I would have a dictionary of team1 of df1 to a list of the corresponding names in df2, df3 etc?

#

Is that what you mean?

odd meteor May 12, 2022, 9:39 PM

#

First thing first. How many data frames do you have? 3?

burnt iron May 12, 2022, 9:43 PM

#

Well

#

Sometimes

#

3

#

Sometimes more

#

But ideally I would like to make is so that the number of dfs doesn't matter

serene scaffold May 12, 2022, 9:46 PM

#

without having read the context, you usually want the number of dataframes in your code to be constant. if there's a step where the number of dataframes is variable, but they all have the same schema, you should be using a multiindex.

odd meteor May 12, 2022, 9:58 PM

#

burnt iron Is that what you mean?

Let's presume you have 3 data frames.

df1 ==> the team names in team column are correctly spelt no errors
df2 and df3 ==> not so good

All you need to do is this

combined_df = pd.concat([df2, df3], axis=0)
combined_df['team_names'].unique()

Use the result to create a csv file that has two columns i.e faulty team names & correct team name (you'd have to do this part manually on your excel or somewhere)

team_names_fixed = { }
with open('team_names.csv', encoding='utf-8') as f:
    lines = f.readlines()
    for line in lines[0:]:
        bad_name, good_name = line.split(',')
        team_name_fixed[bad_name] = good_name

combined_df['team_name'].map(team_names_fixed)

This should be able to fix the problem. If there's a better approach, feel free to try it as well.

burnt iron May 12, 2022, 10:04 PM

#

odd meteor Let's presume you have 3 data frames. 1. df1 ==> the team names in team column ...

I think I understand what your solution is but there are thousands of records in these dfs and I can't really hardcode each difference by hand. What I can do though is use some string matching algorithm to determine which names are most likely the same for all dfs.

#

Which I have done before but now scalability becomes a problem because I would have to hardcode this for every df

#

Or I could use this multiindexing thing which I haven't heard about before so Ill definitely look into that as well

ashen umbra May 13, 2022, 1:12 AM

#

hi I am not sure if this is the right channel to ask a ques abt git hub

#

but I have couple of pickle ML models that I want to load on my colab

#

I am getting an error saying no such file or directory

#

does anyone know how to fix that or even load an ML model pickle on colab directly from github?

urban prism May 13, 2022, 1:19 AM

#

I don't think this is the right place but what's the github of the models?

ashen umbra May 13, 2022, 1:33 AM

#

urban prism I don't think this is the right place but what's the github of the models?

u mean the github link to those pickle files?

#

I can dm u the link if u would like

serene scaffold May 13, 2022, 1:41 AM

#

@ashen umbra if the models are pickle files in a git repository on github, then you need to !git clone them into your colab environment.

ashen umbra May 13, 2022, 1:50 AM

#

serene scaffold <@754124358607437846> if the models are pickle files in a git repository on gith...

it gives me this error:
fatal: destination path 'final_random_forest_model.sav' already exists and is not an empty directory.

#

because I already cloned it in my colab env

#

but not sure how to load it

serene scaffold May 13, 2022, 2:03 AM

#

ashen umbra it gives me this error: fatal: destination path 'final_random_forest_model.sav'...

thanks for giving part of the error message as text. I also need to see the code that caused the error.

#

but if the file is already there, and it's what it's supposed to be, then you need to know what library can open and use that pickle.

misty flint May 13, 2022, 2:14 AM

#

something akin to torch.load() or similar?

#

PikaThink

urban prism May 13, 2022, 2:58 AM

#

ashen umbra it gives me this error: fatal: destination path 'final_random_forest_model.sav'...

Does it exist as a file in your local?

ashen umbra May 13, 2022, 4:04 AM

#

serene scaffold thanks for giving part of the error message as text. I also need to see the code...

actually I went ahead with loading the model from Google drive and it worked! thanks so much everyone

ashen umbra May 13, 2022, 4:05 AM

#

urban prism Does it exist as a file in your local?

yess. I added in my local env

cunning parrot May 13, 2022, 6:25 AM

#

uhm... anyone got an idea how i can stop my grafana from connecting two points if there is no value inbetween, im getting CO2 values but the device crashed until like 20mins ago when i started it, so there was no data, but grafana still connected it, how do i stop that?

it doesnt revieve null values, it recieves nothing

(not sure if this question is for this channel, if its not just say it to me)

#

celest vine May 13, 2022, 6:59 AM

#

Hi

#

How can I know if two images are similar or not?

weary cloud May 13, 2022, 7:35 AM

#

Hi guys, I am building an artificial intelligence. Someone knows if exist a speech standard I can embed on my project, otherwise I should write every sentences

rigid summit May 13, 2022, 7:55 AM

#

Hello all
I'm having issues CONCATENATING the output from my LSTM layer and one hot encoding values, that will finally be passed to a dense layer.

Can anyone help ??

bold timber May 13, 2022, 9:40 AM

#

Hi, I have a question about data text: What is get_features_names()? Why does it make more words than stopwords?

tawny vine May 13, 2022, 10:03 AM

#

sure

young granite May 13, 2022, 10:20 AM

#

someone with a bit of plotly 3dsurface knowledge in here to help me in #help-bread

glacial sparrow May 13, 2022, 12:24 PM

#

anyone know any website where a shapes file for Taipei is available for download?

#

previously I used this https://download.geofabrik.de/asia/taiwan.html but it does not print the whole thing as the image shows

loud cove May 13, 2022, 1:56 PM

#

Wouldn't this be 1.8/3 = 0.6?

hidden frigate May 13, 2022, 2:15 PM

#

I'm struggling a bit with handling wheelEvents in pyqtgraph. Does anyone with pyqtgraph/qt experience have a moment to provide a bit of guidance?

steep oyster May 13, 2022, 2:50 PM

#

Can someone clearly explain to me (a 7th-grade boy) how this works exactly? I never learned about circles, and I'm unsure how can sine need a list instead of opposite/hypotenuse

#

Ping me, thanks.

#

Also i'm sorry if asking both in a help channel AND here is against the rules, I'll delete this one if it's against.

rain sand May 13, 2022, 3:12 PM

#

is there anyone who can help me out in machine leaning project

steep oyster May 13, 2022, 3:20 PM

#

@rain sand don't ask to ask

rain sand May 13, 2022, 3:20 PM

#

steep oyster <@962343308158447716> don't ask to ask

so

#

??

green wasp May 13, 2022, 3:31 PM

#

So I have a quickie question. I managed to get live flight data from flightradar24 and it only returns 1500 elements each call. I was wondering if it would be smarter to keep it open as a stream, since it seems to support it, or make a 50ms long request every 5 seconds

#

My objective is to gather data and make some dashboards using django and some other libraries(not sure which but I’ll find some) and study stuff like. How many flights from airport, how many are intercontinental and how many are extra continental, how many are interstate and how many extrastate and so on

#

I was debating using elasricsearch and kibana for storage and visualization but I decided to opt for mongodb and a custom django site because it’s a teaching experience

#

So my main question ias, stream the data or make periodic requests a few ms long? Because 1500 json entries arw not enough and I’m not sure they’re different than entries each time

rain sand May 13, 2022, 3:39 PM

#

why i cannot ask

#

here

#

???

#

is there any reason behind it

#

???

wicked pike May 13, 2022, 3:42 PM

#

Hi guys i have a question on pandas

sand cliff May 13, 2022, 4:03 PM

#

Hey guys, attempting to extract a google sheet content into a dataframe, feeding it into an URL:

url = f"https://docs.google.com/spreadsheets/d/{sheet_id}/gviz/tq?tqx=out:csv&sheet=Today's Records"
df = pd.read_csv(url)

As you can see I have a control character in the Today's Records (') and it's throwing an error as URL cannot contain control characters, unfortunately, the sheet is named thus and I cannot change it. Anyone know if it can be encoded or a workaround?

shadow halo May 13, 2022, 4:09 PM

#

Hello people I'm working to get better working with different datasets and I got one with 60 samples of olive oils coming from 4 regions each coming with 570 features. I wanna make an MLP that can classify them so the maximum I could pull is an accuracy of 0.87 with 2 hidden layers of 100 neurons. I wanna get better then that and changing the topology doesn't get me any further. So I'm kinda of lost of how should I process my data knowing that I passed it on a Standard Scaler before setting the training. So I'm looking for how to reduce that number of features to maybe get better results? Knowing that the values of set features are spectrographic values.

#

wicked pike May 13, 2022, 5:47 PM

#

how do i filter latest timestamp hourly in this df?

serene scaffold May 13, 2022, 5:48 PM

#

wicked pike how do i filter latest timestamp hourly in this df?

you could groupby hour and take the max

wicked pike May 13, 2022, 5:49 PM

#

say if i want to keep the df?

#

does drop duplicate works?

serene scaffold May 13, 2022, 5:49 PM

#

wicked pike say if i want to keep the df?

pandas operations pretty much always return copies. it's evident in the code when something you're doing overwrites your existing data

still wind May 13, 2022, 6:00 PM

#

Hey guys I am running tensorflow with gpt2, I am trying to run the sequence generator file and I followed all the other steps. This mf been running for a day, how long does it take to finish and what do I do after that lol

serene scaffold May 13, 2022, 6:01 PM

#

still wind Hey guys I am running tensorflow with gpt2, I am trying to run the sequence gene...

are you using a GPU?

odd meteor May 13, 2022, 6:07 PM

#

still wind Hey guys I am running tensorflow with gpt2, I am trying to run the sequence gene...

it's been running for + 24 hours now? Like, you've slept, rave, ate, woke up and it's still running? 😃 How large is the data? Are you using GPU?

serene scaffold May 13, 2022, 6:08 PM

#

the question on everyone's mind, apparently

odd meteor May 13, 2022, 6:10 PM

#

serene scaffold the question on everyone's mind, apparently

Lol, hopefully it doesn't extend to 48hrs

raw mortar May 13, 2022, 6:24 PM

#

cunning parrot uhm... anyone got an idea how i can stop my grafana from connecting two points i...

A bit late, but under display section for the panel, you can toggle points and disable lines, there some other settings in there too

misty flint May 13, 2022, 6:24 PM

#

green wasp So my main question ias, stream the data or make periodic requests a few ms long...

you could go either way. if you use mongodb, mongodb atlas allows the dashboards to update according to the time interval you set

#

i think the default for the "live dashboards" is 30s

#

PikaThink

steady basalt May 13, 2022, 6:25 PM

#

Yooo got a m1 mac arriving tomorrow

misty flint May 13, 2022, 6:25 PM

#

for streaming, i believe you can use kafka or something similar

steady basalt May 13, 2022, 6:25 PM

#

Finally goin to use a non potato

misty flint May 13, 2022, 6:25 PM

#

dancingpug

#

nice

steady basalt May 13, 2022, 6:26 PM

#

Does the gpu count as a gpu?

#

How does it fare with TF

misty flint May 13, 2022, 6:26 PM

#

ive heard their gpu is actually one of the better ones for ML (compared to older stuff)

steady basalt May 13, 2022, 6:26 PM

#

I might run some benchmarks if anyone’s interested

#

See how it stacks up v my old 2015 i5

misty flint May 13, 2022, 6:26 PM

#

but this is info from podcasts so dont know how accurate it is

steady basalt May 13, 2022, 6:26 PM

#

I believe I need to use Metal

misty flint May 13, 2022, 6:27 PM

#

steady basalt I might run some benchmarks if anyone’s interested

i would be interested

#

please ping me if you do so

steady basalt May 13, 2022, 6:27 PM

#

It won’t be anything thorough, just a simple dataset and maybe a simple Cnn

#

And just time them

misty flint May 13, 2022, 6:27 PM

#

results are results

#

still would be interesting to see

steady basalt May 13, 2022, 6:27 PM

#

I’m estimating it’s going to triple training speed

misty flint May 13, 2022, 6:28 PM

#

lol dont get your hopes up before it arrives

steady basalt May 13, 2022, 6:28 PM

#

But probably not beat co lab by much

#

Co lab gpus aren’t amazing tho

misty flint May 13, 2022, 6:28 PM

#

yeah

#

most likely

steady basalt May 13, 2022, 6:28 PM

#

I’ll try all 3

#

It may depend on other stuff tho

#

Such as how I run it on arm

cunning parrot May 13, 2022, 6:29 PM

#

raw mortar A bit late, but under display section for the panel, you can toggle points and d...

fixed it by using $__timeGroup(timestamp, '5m', 0) macro

steady basalt May 13, 2022, 6:29 PM

#

How do people access more powerful gpu on co lab

#

https://betterdatascience.com/macbook-m1-vs-google-colab/amp/

Better Data Science

MacBook M1 vs. Google Colab for Data Science - Unexpected Results |...

Expensive laptop (M1 MacBook Pro) vs. entirely free Google Colab for data science and machine learning? Here are the differences in TensorFlow.

#

Surprising

#

However

#

https://m.youtube.com/watch?v=JWYsWhR3Pxg

YouTube

Daniel Bourke

Apple's M1 Pro and M1 Max are faster than Google Colab (machine lea...

Let's see how Apple's new M1 Pro and M1 Max deal with various machine learning workloads.

Blog post with results - https://www.mrdbourke.com/m1-pro-m1-max-machine-learning-speed-test-comparison
Code on GitHub - https://github.com/mrdbourke/m1-machine-learning-test
Setup your M1 Mac for machine learning video - https://youtu.be/_1CaUOHhI6U

Link...

▶ Play video

#

Tbh, google co lab is OP for small projects

#

I would just stick with a cheap laptop and use that if my current one didn’t get super bad

#

And I like shiny new things

#

Plus u don’t need internet

quartz raptor May 13, 2022, 6:47 PM

#

i have a 1billion line sorted csv file and i would like to find a certain entry with binary search, can pandas do this?

#

i can implement the binary search myself, by jumping to the approx middle of the file using file.seek() and then find the next line but is there nothing better?

steady basalt May 13, 2022, 7:10 PM

#

quartz raptor i have a 1billion line sorted csv file and i would like to find a certain entry ...

What is the data

#

Can’t u just do it the old fashioned way using min max list and divide by two ?

ripe garnet May 13, 2022, 7:11 PM

#

steady basalt https://m.youtube.com/watch?v=JWYsWhR3Pxg

wowwww

steady basalt May 13, 2022, 7:12 PM

#

What?

quartz raptor May 13, 2022, 7:12 PM

#

steady basalt Can’t u just do it the old fashioned way using min max list and divide by two ?

what do you mean by min max list?

steady basalt May 13, 2022, 7:12 PM

#

@quartz raptor do u know binary search alg

quartz raptor May 13, 2022, 7:12 PM

#

yes

steady basalt May 13, 2022, 7:12 PM

#

Like the one to solve that leetcode question

#

Is it possible to turn a Normal list into nodes?

quartz raptor May 13, 2022, 7:13 PM

#

the file is about 100gb

steady basalt May 13, 2022, 7:13 PM

#

In fact u don’t even need nodes surely

quartz raptor May 13, 2022, 7:13 PM

#

i cant load that into memory

steady basalt May 13, 2022, 7:13 PM

#

Where’s this data from

#

U gona need some serious power

quartz raptor May 13, 2022, 7:13 PM

#

binance all trades since 2017

steady basalt May 13, 2022, 7:14 PM

#

What are u looking for

quartz raptor May 13, 2022, 7:14 PM

#

no i can do it in few ms if i do it myself

#

but i want it even faster

#

im just doing some datascience on it

steady basalt May 13, 2022, 7:14 PM

#

Well what are u looking for in the search

quartz raptor May 13, 2022, 7:14 PM

#

yes so the data is 1billion rows of trades, sorted by timestamp

#

and i would like to get a dataframe of all trades between two timestamps

steady basalt May 13, 2022, 7:15 PM

#

What cpu u using

#

And ram

quartz raptor May 13, 2022, 7:15 PM

#

its not a problem

#

i have to just not open the file all at once

steady basalt May 13, 2022, 7:15 PM

#

Just do normal pandas selection then

#

Try 20gb at a time

#

That’s manageable

quartz raptor May 13, 2022, 7:15 PM

#

no that opens the whole file

steady basalt May 13, 2022, 7:15 PM

#

No load it in

#

In chunks

#

Load the first 20 gn

quartz raptor May 13, 2022, 7:15 PM

#

yes thats still slow

steady basalt May 13, 2022, 7:16 PM

#

Gb

quartz raptor May 13, 2022, 7:16 PM

#

it will still have to load everything eventually

steady basalt May 13, 2022, 7:16 PM

#

When u specify read csv

#

There’s a way I think to just take the first x rows

quartz raptor May 13, 2022, 7:16 PM

#

yes i know

#

but

#

that still reads all lines one by one just doesnt save them in memory

#

that still takes like a minute

steady basalt May 13, 2022, 7:16 PM

#

I’m sorry I don’t know a faster method

quartz raptor May 13, 2022, 7:17 PM

#

i think i will do binary search with file.seek() and then pass the filestream to pandas

#

was just wondering if there exists a library that does that in c

steady basalt May 13, 2022, 7:17 PM

#

The biggest dataset I’ve ever worked on is 20gb

#

I’ve never needed to deal with this much

#

I’m sure there’s a solution

#

If u only need to find this once just do it the normal way

quartz raptor May 13, 2022, 7:19 PM

#

no i need to find it every batch

#

so i would like it to be less than 100ms

steady basalt May 13, 2022, 7:20 PM

#

The search or loading?

quartz raptor May 13, 2022, 7:20 PM

#

give me like 10mins ill show the code

#

well both, but once the search is done i will only load like 10k lines

steady basalt May 13, 2022, 7:20 PM

#

Load the csv in first entirely and then do one single search

quartz raptor May 13, 2022, 7:20 PM

#

which is fast enough

steady basalt May 13, 2022, 7:20 PM

#

One batch is 10k?

quartz raptor May 13, 2022, 7:20 PM

#

yes

#

well idk yes

steady basalt May 13, 2022, 7:20 PM

#

I don’t understand, if it’s time ordered you’d know roughly how much lines u need

#

So load until the time stamps out of desired range

#

if it’s only a small period of time you can cut down excess and ur left to work on your ?2000 rows of data

tacit basin May 13, 2022, 7:23 PM

#

Let dask to take care of batching and stuff 😜

steady basalt May 13, 2022, 7:23 PM

#

I think keeping things under X milliseconds and using binary searches is getting into SWE territory

#

Is the issue loading the data in or optimising search once it’s loaded

quartz raptor May 13, 2022, 7:25 PM

#

from what i understand dask is to have big data in memory

#

i dont need alot in memory actually

steady basalt May 13, 2022, 7:25 PM

#

You’d have found the data u want to work on by now if u just loaded ur csv in and ran a search

quartz raptor May 13, 2022, 7:26 PM

#

no you dont get it

#

i will use all the data

#

just not all at once

#

so i would have to do that over and over and over again

steady basalt May 13, 2022, 7:26 PM

#

Do it at once

#

Ur specs are elite

#

It will take 10 min

#

And maybe doing it over and over would be faster anyway

#

Try

#

Where did u get this data from is it public

#

They actually posted their entire transaction times and not allow you to download instead yearly ones

frigid elk May 13, 2022, 7:30 PM

#

can you convert the csv to parquet and load using predicate pushdown?

#

you could take it a step further and partition the parquet by date ranges

wicked pike May 13, 2022, 7:32 PM

#

how would i filter to get the latest timestamp in this diagram?

frigid elk May 13, 2022, 7:32 PM

#

you could take it a step further and utilize spark to do the heavy lifting

steady basalt May 13, 2022, 7:33 PM

#

wicked pike how would i filter to get the latest timestamp in this diagram?

Filter or order?

wicked pike May 13, 2022, 7:34 PM

#

i would say filter to get a output like this

steady basalt May 13, 2022, 7:34 PM

#

In general or just for this specific case

#

You want 4 and 9 only

wicked pike May 13, 2022, 7:35 PM

#

for this specific case

steady basalt May 13, 2022, 7:35 PM

#

Then just Index it

#

Much easier than having to filter for biggest time

#

It seems u already done so

wicked pike May 13, 2022, 7:36 PM

#

okay my bad, i forget to mention the rows is 74520

#

so i guess in general

#

i just print the head for that case

steady basalt May 13, 2022, 7:37 PM

#

You want to return the two highest values ?

#

In entire data

#

Sort in order by time stamp and take the top two

wicked pike May 13, 2022, 7:37 PM

#

the highest time stamp for each hour

steady basalt May 13, 2022, 7:37 PM

#

I’d create new columns for each hour

wicked pike May 13, 2022, 7:38 PM

#

and the measurement timestamp is already sorted by ascending

steady basalt May 13, 2022, 7:38 PM

#

Now have 24 hours

#

Columns

#

Easy to find top from each

#

Conditional index hour 01:00 time stamps to 24:00

#

Or I guess u can just do a search that way

#

Using order and take the top row

#

No need to make columns

wicked pike May 13, 2022, 7:40 PM

#

steady basalt Using order and take the top row

what do you mean by this?

steady basalt May 13, 2022, 7:40 PM

#

U mean without making new columns ?

wicked pike May 13, 2022, 7:41 PM

#

yes without making new columns

steady basalt May 13, 2022, 7:41 PM

#

Locate indexes where hour = 15:00 order by time stamp iloc 0 th row?

#

I’d do that 24 times cause I forgot how to return a data frame for each one in one command

#

U might be better off using sql?

wicked pike May 13, 2022, 7:44 PM

#

im trying to learn and practice my pandas foundations as i suck at it lool

#

but i appreciate the help

steady basalt May 13, 2022, 7:46 PM

#

Google best for syntax

#

I think u can double sort

pale vortex May 13, 2022, 8:31 PM

#

How do I replace all columns in a dataframe with a single list? I have a dataframe of shape (5,8) and a list of shape (5,) which I called a. Doing df[:] = a raises a could not broadcast shape (5,) to (5,8) error.

tidal bough May 13, 2022, 8:43 PM

#

Hmm, maybe do df[:] = a.reshape(-1,1)

#

worst case scenario, you'd need to manually broadcast a to the right shape

serene scaffold May 13, 2022, 8:50 PM

#

pale vortex How do I replace all columns in a dataframe with a single list? I have a datafra...

I'm not sure I follow. are you trying rename the columns or what?

pale vortex May 13, 2022, 8:52 PM

#

serene scaffold I'm not sure I follow. are you trying rename the columns or what?

I have an empty DF of size (5,8), I just want to replace all columns with a list of size 5

serene scaffold May 13, 2022, 8:53 PM

#

pale vortex I have an empty DF of size (5,8), I just want to replace all columns with a list...

I don't understand. columns are just a vertical slice of the data. if the shape of the dataframe is (5, 8), then you have five rows and 8 columns.

#

also, why do you have an empty dataframe in the first place?

pale vortex May 13, 2022, 8:56 PM

#

I have a list of numbers five numbers [1,2,3,4,5]. I wanted to get a DF that has each column as that list of numbers, so in this case eight columns which look like:

#

my strategy was to create an empty df of size (5,8), and fill them all in one go

serene scaffold May 13, 2022, 8:58 PM

#

In [1]: arr = np.arange(1, 6)

In [2]: arr
Out[2]: array([1, 2, 3, 4, 5])

In [6]: arr.reshape(-1, 1).repeat(8, axis=1)
Out[6]:
array([[1, 1, 1, 1, 1, 1, 1, 1],
       [2, 2, 2, 2, 2, 2, 2, 2],
       [3, 3, 3, 3, 3, 3, 3, 3],
       [4, 4, 4, 4, 4, 4, 4, 4],
       [5, 5, 5, 5, 5, 5, 5, 5]])
In [9]: pd.DataFrame(arr.reshape(-1, 1).repeat(8, axis=1))
Out[9]:
   0  1  2  3  4  5  6  7
0  1  1  1  1  1  1  1  1
1  2  2  2  2  2  2  2  2
2  3  3  3  3  3  3  3  3
3  4  4  4  4  4  4  4  4
4  5  5  5  5  5  5  5  5

#

or, as a stand-alone expression, pd.DataFrame(np.arange(1, 6).reshape(-1, 1).repeat(8, axis=1))

#

with numpy/pandas, you don't want to allocate empty space for data and fill it later. the library supports doing these sorts of things in one go.

quartz raptor May 13, 2022, 9:00 PM

#

steady basalt Do it at once

so i did this: ```python
from timeit import timeit
from pathlib import Path
import os

from typing import IO
import pandas as pd

FILE_NAME = Path(file).parent / '../data/trades.csv'
MAX_LINE_LENGTH = 256

def get_timestamp_from_line(line: str, col: int = 4) -> int:
r = line.split(',')
r = r[col]
return int(r)

def get_line(file: IO, cursor: int) -> str:
'''return the line in a file containing the cursor'''
file.seek(cursor)

char = ''
while char != '\n':
    cursor -= 1
    file.seek(cursor)
    char = file.read(1)
return file.readline()

def binary_search_interval(from_timestamp: int, to_timestamp: int) -> pd.DataFrame:

with open(FILE_NAME, 'r') as file:
    file.seek(0, os.SEEK_END)
    cursor = jump = file.tell() // 2  # get file size in bytes

    # get cursor to before the line containing from_timestamp
    while jump > 1:
        # get timestamp at current step
        line = get_line(file, cursor)
        timestamp = get_timestamp_from_line(line)

        jump //= 2
        if timestamp < from_timestamp:
            cursor += jump
        elif timestamp > from_timestamp:
            cursor -= jump
        else:
            break
    start = file.tell()
    num = 0
    while True:
        line = file.readline()
        if get_timestamp_from_line(line) > to_timestamp:
            break
        num += 1
    file.seek(start)
    return pd.read_csv(file, nrows=num - 1)

def test_pd():
df = pd.read_csv(FILE_NAME, nrows=1000, skiprows=10000000)
print(df)
...

def test_bs():
df = binary_search_interval(1620255222803, 1620255249925)
print(df)

def main() -> int:
t1 = timeit(test_pd, number=1)
t2 = timeit(test_bs, number=1)
print(t1)
print(t2)
return 1

if name == 'main':
raise SystemExit(main())

serene scaffold May 13, 2022, 9:01 PM

#

I'll break down what this does.

pd.DataFrame(  # convert it to a DF at the end
    np.arange(1, 6) \  # make an array from [1, 6)
    .reshape(-1, 1) \  # make it a column
    .reshape(8, axis=1)  # repeat this 8 times to the left
)

pale vortex May 13, 2022, 9:02 PM

#

I seem this makes sense. If I had a pandas series instead of an array, would this reshape and repeat logic still work?

serene scaffold May 13, 2022, 9:02 PM

#

pale vortex I seem this makes sense. If I had a pandas series instead of an array, would thi...

try it 😄

quartz raptor May 13, 2022, 9:03 PM

#

quartz raptor so i did this: ```python from timeit import timeit from pathlib import Path impo...

the binary search runs in 7ms while pd.read_csv takes 2seconds, keep in mind that this is only a 4gb subset of the actual data so pd.read_csv should take about 40seconds on the full data while mine will stay at pretty much the same speed

pale vortex May 13, 2022, 9:03 PM

#

serene scaffold try it 😄

sure

steady basalt May 13, 2022, 9:04 PM

#

quartz raptor the binary search runs in 7ms while `pd.read_csv` takes 2seconds, keep in mind t...

Are you developing software ?

#

It’s cool and stuff but unless I have a global template I’m cool waiting a few seconds before I look at the data

quartz raptor May 13, 2022, 9:06 PM

#

steady basalt Are you developing software ?

you mean as a job? no. I study math

steady basalt May 13, 2022, 9:06 PM

#

I mean with this data

quartz raptor May 13, 2022, 9:07 PM

#

well i guess im just trying stuff out

steady basalt May 13, 2022, 9:07 PM

#

U said u only wanted to work on a subset of the data

#

Ml project?

quartz raptor May 13, 2022, 9:07 PM

#

well ok, so i am trying to tokenize this 100gb data into smaller chunks using an auto-encoder

steady basalt May 13, 2022, 9:08 PM

#

Unless ur constantly working with 100gb files it’s prob faster to wait for a csv to load than write 200 lines making a binary search

quartz raptor May 13, 2022, 9:08 PM

#

and the auto encoder will have some attention span which has to get loaded at the same time

#

so to train i will have to load random chunks of the data for each batch

#

so i have to be able to read the whole file very fast

steady basalt May 13, 2022, 9:08 PM

#

But you only want 2017 days don’t u

quartz raptor May 13, 2022, 9:09 PM

#

no why?

steady basalt May 13, 2022, 9:09 PM

#

I thought u said

quartz raptor May 13, 2022, 9:09 PM

#

i have all the data from 2017 to 2022

#

no

steady basalt May 13, 2022, 9:09 PM

#

What’s the project aim

#

What are u predicting

#

Or classifying

quartz raptor May 13, 2022, 9:10 PM

#

well some im trying to see how much noise vs information is in the btc (or any crypto)'s price.

steady basalt May 13, 2022, 9:10 PM

#

How will u do that

#

(I didn’t graduate math)

quartz raptor May 13, 2022, 9:10 PM

#

and if it turns out that the market is inefficient i.e. the price can be predicted and isnt just random then i can maybe make a trading bot

#

well im also just a first year, so i didnt learn any of this (yet)

steady basalt May 13, 2022, 9:11 PM

#

First year math ?

quartz raptor May 13, 2022, 9:11 PM

#

yes

steady basalt May 13, 2022, 9:11 PM

#

This is hella hard project bro

#

Is it ur second degree or?

quartz raptor May 13, 2022, 9:12 PM

#

yeah i dont really need to hit the goal, its just about learning

#

no my first

steady basalt May 13, 2022, 9:12 PM

#

Did u already do smaller ML projects

quartz raptor May 13, 2022, 9:13 PM

#

actually i did some work on differential equation solvers using ml

steady basalt May 13, 2022, 9:13 PM

#

I’m curious, where did you learn to code like this as a first year math student

quartz raptor May 13, 2022, 9:13 PM

#

and i managed to outperfom state of the art in some metrics

#

oh im programming for like 8 years since im 14 or so

steady basalt May 13, 2022, 9:14 PM

#

How is ml solving your equations?

#

Bruh ur a first year student outperforming soa ML?

quartz raptor May 13, 2022, 9:15 PM

#

i wrote a paper, but its in german

steady basalt May 13, 2022, 9:15 PM

#

We’re talking phd here not bs right

quartz raptor May 13, 2022, 9:15 PM

#

no its not soa in ML it was outperforming soa in regular algorithms

#

but also only in very specific metrics

#

so its somewhat cheating, but still intresting

steady basalt May 13, 2022, 9:16 PM

#

I thought I was pretty nerdy

#

At 18

#

There’s a new generation 😂

quartz raptor May 13, 2022, 9:16 PM

#

youre 18?

#

or when you were 18

steady basalt May 13, 2022, 9:16 PM

#

No I’m 23

quartz raptor May 13, 2022, 9:17 PM

#

oh ok, well even then at 23 you have alot of time

steady basalt May 13, 2022, 9:17 PM

#

I can tell u no one I’ve never met has been writing papers at 18…

#

How old are you?

quartz raptor May 13, 2022, 9:17 PM

#

21 now

steady basalt May 13, 2022, 9:17 PM

#

In Germany u start uni at 20?

quartz raptor May 13, 2022, 9:18 PM

#

switzerland

steady basalt May 13, 2022, 9:18 PM

#

Here we are first year at 18

quartz raptor May 13, 2022, 9:18 PM

#

yes because i had to go to the army for a year

steady basalt May 13, 2022, 9:18 PM

#

Do u have a rifle?

quartz raptor May 13, 2022, 9:18 PM

#

we have an extra year of highschool compared to everywhere else

steady basalt May 13, 2022, 9:19 PM

#

How did you write machine learning papers at such an age

#

How do you find supervisor lol

green wasp May 13, 2022, 9:34 PM

#

misty flint you could go either way. if you use mongodb, mongodb atlas allows the dashboards...

Interesting. Mongodb has live dashboards? I didn’t know, how does that work? I wanted to make an interactive dashboard maker and visualizer in django, looks like there are some libraries that make it relatively easy, but I never knew mongo had something for it.

misty flint May 13, 2022, 9:35 PM

#

green wasp Interesting. Mongodb has live dashboards? I didn’t know, how does that work? I w...

checkout mongodb atlas

green wasp May 13, 2022, 9:36 PM

#

misty flint for streaming, i believe you can use kafka or something similar

Kafka still needs something to produce events. It needs a producer, but that could be it yeah, I’ll look at it. I’ve used kafka at work but we always had either a beat producing to it or a lightweight application doing it

#

I never hooked kafka up to an api and I’m not sure it can be done but maybe I’ve never used it to its full capabilities

misty flint May 13, 2022, 9:37 PM

#

you seem to know more than me about kafka. let me know what you find out though. im curious

#

pithink

green wasp May 13, 2022, 9:43 PM

#

From a quick search you could, in theory, write a connector that fetches from a rest api or use this https://www.progress.com/tutorials/jdbc/import-data-from-any-rest-api-to-kafka-incrementally-using-jdbc

Progress.com

Incremental Kafka REST API Connector Using JDBC Example

In this tutorial you’ll learn how to import data from any REST API using Autonomous REST Connector and ingest that data into Apache Kafka. Using Alpha Vantage API as an example.

#

I don’t like that it involves jdbc though

#

This looks promising

#

https://github.com/llofberg/kafka-connect-rest

GitHub

GitHub - llofberg/kafka-connect-rest: Kafka Connect REST connector

Kafka Connect REST connector. Contribute to llofberg/kafka-connect-rest development by creating an account on GitHub.

#

So in theory I wouldn’t need a python fetcher at all, just kafka pushing to mongo and a django site for the dashboards

wise falcon May 13, 2022, 10:26 PM

#

Hi! I am using PySpark to read, cleanse and write csv files (the files are more than 10gb), after I read the csv file and work with it I try to save it with dataframe.write but it saves it into multiple parts, how could I save the csv file into only one big csv?

ocean swallow May 13, 2022, 10:29 PM

#

bro, google just sent me an alert of aftershock and I was like what? weirdo.

#

20 seconds later earthquake happened

#

future is now

serene scaffold May 13, 2022, 11:02 PM

#

ocean swallow 20 seconds later earthquake happened

nice. how far were you from the epicenter?

plush jungle May 14, 2022, 12:30 AM

#

i'm trying to run the stylegan2 code here https://github.com/johndpope/stylegan2-ada

#

but I can't figure out how to run this command in terminal

#

# Generate curated MetFaces images without truncation (Fig.10 left)
python generate.py --outdir=out --trunc=1 --seeds=85,265,297,849 \
    --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada/pretrained/metfaces.pkl```

#

if I copy paste it into terminal, it only runs the first line until it gets to the \ and newline

#

if I get rid of the newline and run it all as one line, it gives this

#

D:\Python\stylegan2-ada\stylegan2-ada-main>py generate.py --outdir=out --trunc=1 --seeds=85,265,297,849 \ --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada/pretrained/metfaces.pkl
2022-05-13 20:29:26.269912: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2022-05-13 20:29:26.270276: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
WARNING:tensorflow:From C:\Users\mac\AppData\Local\Programs\Python\Python39\lib\site-packages\tensorflow\python\compat\v2_compat.py:107: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
usage: generate.py [-h] {generate-images,truncation-traversal,generate-latent-walk,generate-neighbors,lerp-video} ...
generate.py: error: argument command: invalid choice: '\\' (choose from 'generate-images', 'truncation-traversal', 'generate-latent-walk', 'generate-neighbors', 'lerp-video')```

misty flint May 14, 2022, 1:18 AM

#

ocean swallow bro, google just sent me an alert of aftershock and I was like what? weirdo.

wait which google service

#

like

#

do you have a google pixel?

#

ID_blurryeyes

#

im curious if that would also happen to me if i was near an earthquake

#

PikaThink

misty flint May 14, 2022, 1:21 AM

#

plush jungle ``` D:\Python\stylegan2-ada\stylegan2-ada-main>py generate.py --outdir=out --tru...

not sure. did you double check the requirements?

#

the dependencies seem pretty specific

plush jungle May 14, 2022, 1:23 AM

#

I'm not sure I have all the dependencies, but I don't think the github project works anyway, because when I just put in seeds and network, it gave me this error

#

https://github.com/NVlabs/stylegan2-ada/issues/74

#

and I tried the solutions in that but they didn't change it

#

my worry is that the code is just out of date and no longer maintained or compatible with tensorflow 2.x

misty flint May 14, 2022, 1:24 AM

#

misty flint not sure. did you double check the requirements?

bro...it says tensorflow 2.x is not supported here...

#

kekHands

plush jungle May 14, 2022, 1:25 AM

#

right, but when I tried to uninstall tensorflow 2 and get a tensorflow version with pip, it couldn't find any matching packages

#

so I don't think this code actually works anymore

misty flint May 14, 2022, 1:25 AM

#

welp. thats a bummer

plush jungle May 14, 2022, 1:25 AM

#

yeah. maybe I should try stylegan3

misty flint May 14, 2022, 1:26 AM

#

give it a shot

#

our team used pix2pix and cyclegan recently

#

so their stuff def works

#

DoggoKek

plush jungle May 14, 2022, 1:29 AM

#

misty flint our team used pix2pix and cyclegan recently

thanks for the suggestions! are those specialized for a certain type of image generation?

misty flint May 14, 2022, 1:31 AM

#

plush jungle thanks for the suggestions! are those specialized for a certain type of image ge...

they are pretty much the seminal models in the image translation problem space

#

image translation being like

#

stuff like turning daytime images into nighttime ones

#

and vice versa, etc.

#

but i believe cyclegan gives quite good results for plain image generation as well

#

pix2pix can be used for a lot of different image generation tasks as well

#

@plush jungle check this out https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix

GitHub

GitHub - junyanz/pytorch-CycleGAN-and-pix2pix: Image-to-Image Trans...

Image-to-Image Translation in PyTorch. Contribute to junyanz/pytorch-CycleGAN-and-pix2pix development by creating an account on GitHub.

#

heres a horse2zebra model using cycleGAN i believe

#

kekHands

#

DoggoKek

plush jungle May 14, 2022, 1:46 AM

#

misty flint heres a horse2zebra model using cycleGAN i believe

that's awesome

misty flint May 14, 2022, 1:50 AM

#

right? its pretty dope

#

you can also train the model to do a variety of things

#

my imagination is limited but the potential is def there

#

kekHands

plush jungle May 14, 2022, 2:09 AM

#

@misty flint any idea how to set up Cuda with gpu?

#

I'm on windows 10 and I definitely have an nvidia gpu, but when I try to run stylegan3 I get

AssertionError: Torch not compiled with CUDA enabled```

#

and google is not being very clear about how to do it

misty flint May 14, 2022, 2:17 AM

#

not sure. have you checked this out: https://www.tensorflow.org/install/gpu

TensorFlow

GPU support | TensorFlow

#

this might also be helpful https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/

Installation Guide Windows :: CUDA Toolkit Documentation

The installation instructions for the CUDA Toolkit on MS-Windows systems.

#

theres some documentation on pytorch too https://pytorch.org/docs/master/notes/cuda.html#cuda-semantics

misty flint May 14, 2022, 3:08 AM

#

hmm hmm

#

can you do transformations inside big query

#

blobhyperthink

#

they have their Big Query ML

#

but like

#

i think thats autoML or something

#

tbh im not sure since even after reading it im still confused

#

kekHands

#

interesting

#

there is google data studio

#

but it just looks like a worse tableau

#

oh i forgot to mention

#

earlier i tried AWS Sagemaker for the first time today

#

pretty interesting

#

its like a jupyter notebook but on aws

#

you might ask what could you need this for? you can train a model on aws and create an API endpoint with Sagemaker

#

and then create some web app to to call the API for model inference

#

TIL many things, ~~like how annoying working with the cloud is~~ kekHands

safe elk May 14, 2022, 3:28 AM

#

misty flint TIL many things, ~~like how annoying working with the cloud is~~ <:kekHands:9486...

Yeah bills can be annoying

misty flint May 14, 2022, 3:43 AM

#

safe elk Yeah bills can be annoying

bro luckily i stopped myself before trying anything on my personal aws account

#

my boss was like

#

oh let me get you aws access for the company

#

since these things can end up costing you a wild amount of money sometimes

#

im like

#

uhh ok

#

kekHands

#

ill try not to go into the tens of thousands i guess

#

kekHands

#

i will quadruple-check to make sure i dont leave instances running

#

NervousSip

misty flint May 14, 2022, 4:20 AM

#

@serene scaffold https://mlopsfluff.dstack.ai/p/notebooks-and-mlops-choose-one

Notebooks and MLOps. Choose one.

In the previous issue, I wrote about what MLOps suffers from. Now that I come to think of it, I have realized that it is worth writing about one more thing that stands in our way towards MLOps. You know this thing very well. It’s Jupyter notebooks. In fairness to Jupyter notebooks, they have become the standard way of prototyping ML models all o...

#

i found your alternate self

#

kekHands

#

this is a good summary:

For any ML model, the time spent in a Jupyter notebook is inversely proportional to its reproducibility. The reasons behind this rule are poor modularity and reusability of the code in notebooks, and poor integration with Git. The worst part of it is the habit of using notebooks which incentivizes the practices that go against reproducibility. This seems to be a vicious circle. We use notebooks because they are a great way of prototyping models or exploring data. However, the more you use notebooks, the more problems you’ll face at the deployment stage.

plush jungle May 14, 2022, 5:41 AM

#

I'm trying to run stylegan3 code, but when I do I get

    raise RuntimeError("Ninja is required to load C++ extensions")
RuntimeError: Ninja is required to load C++ extensions```

#

I'm on windows 10 and I've already installed ninja version (1.10.2.3)

green wasp May 14, 2022, 6:38 AM

#

plush jungle right, but when I tried to uninstall tensorflow 2 and get a tensorflow version w...

https://www.tensorflow.org/install/pip

TensorFlow

Install TensorFlow with pip

#

Hope this helps!

flat fiber May 14, 2022, 6:42 AM

#

Hi. I need to classify 25 unlabelled images, depending on their similarity.
I have never done Computer Vision. Can someone tell me a general process to tackle this, or a tutorial that does something similar?

quasi ether May 14, 2022, 8:42 AM

#

what does embedding_dim arg do in keras.layers.Embedding?

wooden sail May 14, 2022, 8:49 AM

#

it's the dimension of the vector space you want to use to encode a set of integers

#

you can think of it as a function f: R -> R^n, where n is the embedding_dim

#

for example, if you chose n=3, you're asking keras to represent your list of integers by using vectors in R^3, i.e. vectors of the form [x,y,z], where x,y,z are no longer integer valued

#

the idea is that by doing this, you assign a geometric representation to the integers of the classes or words you were encoding, and this makes it possible to look for structure like clustering of similar classes. whether the clustering is evident depends on the number of dimensions. too few makes it so that no structure is present. too many becomes wasteful because structure usually has a low rank

steady basalt May 14, 2022, 10:48 AM

#

Who was it that wanted tf gpu benchmarked

#

@misty flint ?

#

M1 pro using tf metal for the gpu is so fucking fast

steady basalt May 14, 2022, 12:53 PM

#

And co lab is faster

#

Well

#

😦

#

if anyone cares

#

#

Time (s) to convolve 32x7x7x3 filter over random 100x100x100x3 images (batch x height x width x channel). Sum of ten runs.
CPU (s):
0.632617707999998
GPU (s):
0.12009491599928879
GPU speedup over CPU: 5x

#

google co lab Time (s) to convolve 32x7x7x3 filter over random 100x100x100x3 images (batch x height x width x channel). Sum of ten runs.
CPU (s):
4.349970917000064
GPU (s):
0.03639778600017962
GPU speedup over CPU: 119x

#

so... co lab is much much faster than my m1 gpu, seems wrong considering people have reported the m1 pro being faster

#

anyone any idea why

#

im using miniforge so i believe its native

ocean swallow May 14, 2022, 1:05 PM

#

serene scaffold nice. how far were you from the epicenter?

approx. 45km

ocean swallow May 14, 2022, 1:06 PM

#

misty flint wait which google service

it is straight up google service. a page opened similar to that 2-factor number bubbles. scary as hell.

#

Said an earthquake just happened. beware of aftershocks.

#

kinda like this but this is an actual prediction.

#

I think they are using accelerometer data of people around and sending them faster than the shockwave. wow

urban prism May 14, 2022, 1:10 PM

#

The future is now.

iron basalt May 14, 2022, 1:23 PM

#

steady basalt anyone any idea why

This is probably just TF being bad. The CPU/GPU difference for convolutions should probably be much larger (on pretty much any device that has GPGPU). Some have TF slower on GPU than on CPU for the m1 (clearly something is broken).

steady basalt May 14, 2022, 1:23 PM

#

iron basalt This is probably just TF being bad. The CPU/GPU difference for convolutions shou...

I have read in many places that the m1 gpu beats COLAB though, and from the chart you can see i plotted training a basic neural net the k80 wins

#

how did those people get the correct performance

iron basalt May 14, 2022, 1:25 PM

#

steady basalt I have read in many places that the m1 gpu beats COLAB though, and from the char...

Did they do the same task?

steady basalt May 14, 2022, 1:25 PM

#

yes

#

maybe you need to install numpy specially?

iron basalt May 14, 2022, 1:25 PM

#

Numpy does not use the GPU.

steady basalt May 14, 2022, 1:26 PM

#

https://stackoverflow.com/questions/70240506/why-python-native-on-m1-max-is-greatly-slower-than-python-on-old-intel-i5

Stack Overflow

Why Python native on M1 Max is greatly slower than Python on old In...

I just got my new MacBook Pro with M1 Max chip and am setting up Python. I've tried several combinational settings to test speed - now I'm quite confused. First put my questions here:

Why python run

#

have to use veclib?

iron basalt May 14, 2022, 1:27 PM

#

Also a K80 is a pretty beefy device relative to the M1, so while it could be faster than the K80, it would probably require a decent amount of effort that actually makes use of the M1 in the way that it wants to be used.

#

I don't think most are used to working with that kind of device.

steady basalt May 14, 2022, 1:27 PM

#

initial benchmarks when it released quicky found its better

iron basalt May 14, 2022, 1:28 PM

#

Who tested it?

steady basalt May 14, 2022, 1:28 PM

#

like tonnes of poeple google it

#

so not sure why i found it slower than co lab

iron basalt May 14, 2022, 1:30 PM

#

Can you link me to one that does the same benchmark you showed above of the 32x7x7x3 on 100x100x100x3 images?

steady basalt May 14, 2022, 1:39 PM

#

closed it now but its

#

official google or tf benchmarking code

#

u shud be able to find it by pasting

wooden vault May 14, 2022, 1:48 PM

#

is it possible to make a python program that plays an online driving game. (asked out of curiosity)

iron basalt May 14, 2022, 2:00 PM

#

Well, IDK, but the largest M1 max's GPU is about 10.4 TFLOPS while the K80 is about 8.73 TFLOPS. The largest M1 pro has half as many GPU execution units as the M1 max. So just from that it seems a bit suspicious. But, if the model is not too large I could totally see the M1 being faster simply due to its GPU being integrated (memory/batch transfer rate / speed) (for training, for non-training it's probably even faster).

#

*Single precision.

#

**Nvidia lies all the time though so the 8.73 TFLOPS could be complete BS.

#

**But depends how much Apple is lying too.

iron basalt May 14, 2022, 2:06 PM

#

steady basalt official google or tf benchmarking code

^

steady basalt May 14, 2022, 2:07 PM

#

iron basalt Well, IDK, but the largest M1 max's GPU is about 10.4 TFLOPS while the K80 is ab...

So any reason why training a model on iris data takes 6ms per step on co lab and 14 on m1 pro

iron basalt May 14, 2022, 2:09 PM

#

Last I checked in on TF the issue was not using the m1's zero copy memory transfer to the GPU so it would only be faster if you did large enough batch sizes.

#

Back then the CPU was often reported as faster than GPU.

#

So it probably is better now, but still.

steady basalt May 14, 2022, 2:11 PM

#

So if I used a massive data set I’d find the m1 is faster thank80?

#

I’ll give it a go later

iron basalt May 14, 2022, 2:11 PM

#

The m1 or m1 pro or m1 max (and there are multiple of each)?

steady basalt May 14, 2022, 2:11 PM

#

The iris data is like 1400

#

M1pro

iron basalt May 14, 2022, 2:13 PM

#

Maybe, gotta just try it. But if the TFLOPS are true (although they are theoretical) then no.

#

In practice most programs won't even make use of like 20% of the theoretical.

steady basalt May 14, 2022, 2:14 PM

#

There’s no way google are cheating right, they wouldn’t just rig notebook to run better on iris dataset

iron basalt May 14, 2022, 2:14 PM

#

(takes too much effort, people don't want to spend so much time on a single device when new ones come out all the time and the libraries are targeting many different ones)

steady basalt May 14, 2022, 2:14 PM

#

And the m1 demolishes anything similar to it cpu wise, I just need to get RTX benchmark next to see

iron basalt May 14, 2022, 2:14 PM

#

Google, Apple, Nvidia, and AMD heavily cheat on performance benchmarks and straight up lie (sometimes they pay fines for it like Nvidia, but that is a small cost for them).

#

(Nvidia lies probably the most IMO)

#

(And they also lock off parts of the hardware unless you pay for a special license or hack it)

#

For CPUs, yeah, the M1 should win, at least in performance per watt, and for the GPU too.

#

For CPUs the m1 max is very fast, but still slower than the fastest AMD CPUs.

#

Although they use way less power, which is what really matters if you want to have a bunch of them training stuff.

#

M1 should beat out current Intel CPUs (in pretty much every way) (until Intel finishes their new factory and maybe their new stuff is better).

steady basalt May 14, 2022, 2:18 PM

#

It’s twice as fast as Ryzen 9

iron basalt May 14, 2022, 2:18 PM

#

Which Ryzen 9?

steady basalt May 14, 2022, 2:18 PM

#

HX

#

I posted the benchmark

obsidian bough May 14, 2022, 2:21 PM

#

Hey

#

Anyone know what happened to pywhat?

#

It's not working in my code

#

I tried to repair it

#

But it didn't work

iron basalt May 14, 2022, 2:24 PM

#

steady basalt HX

The mobile CPU? Yeah that's a weaker CPU.

#

If by similar you mean for power consumption yeah.

#

The power consumption on AMD CPUs is pretty bad, but ofc if you are not worried about that, then AMD CPUs are def. faster than any M1.

obsidian bough May 14, 2022, 2:28 PM

#

obsidian bough Anyone know what happened to pywhat?

!pypi pywhatkit

arctic wedgeBOT May 14, 2022, 2:28 PM

#

pywhatkit v5.3

PyWhatKit is a Simple and Powerful WhatsApp Automation Library with many useful Features

obsidian bough May 14, 2022, 2:29 PM

#

obsidian bough It's not working in my code

???

versed gulch May 14, 2022, 3:11 PM

#

Hi is there any way I can save my numpy array of shape (30, 400, 400) = (number of slices/images, height, width) as an image file type?

wooden sail May 14, 2022, 3:15 PM

#

PIL, scipy, and cv2 all seem to have ways of doing this

#

if it's just 2D images, you could also save each slice using matplotlib in a loop, using plt.savefig(...) while iterating over the slices

versed gulch May 14, 2022, 3:17 PM

#

i want to use the library SITK which only reads images that are of 3D file types thats why I need to save the image as a 3D format

wooden sail May 14, 2022, 3:19 PM

#

in that case you'd wanna look up how to convert exactly into the format you need, since having more than 4 layers in an image is not a standard image format that goes around

versed gulch May 14, 2022, 3:19 PM

#

wooden sail PIL, scipy, and cv2 all seem to have ways of doing this

PIL doesnt seem to be able to do this

wooden sail May 14, 2022, 3:19 PM

#

you want the image to be 30 x 400 x 400?

versed gulch May 14, 2022, 3:19 PM

#

yh

wooden sail May 14, 2022, 3:20 PM

#

that's not a standard image format

#

anyway sitk can stack them afterwards, so the format doesn't matter

versed gulch May 14, 2022, 3:21 PM

#

even 400x400x30?

wooden sail May 14, 2022, 3:21 PM

#

you have to look up a format that supports more than 4 layers. none of the normal ones do

misty flint May 14, 2022, 3:21 PM

#

steady basalt

pithink

#

interesting

wooden sail May 14, 2022, 3:26 PM

#

sitk also says it only takes 2,3 and 4D images. what are you trying to do?

versed gulch May 14, 2022, 3:28 PM

#

wooden sail sitk also says it only takes 2,3 and 4D images. what are you trying to do?

im trying to run this function

def oof3response(image=None, radii=[], resp_type=3):
    print('   ¬ Compute OOF filter response ...')

    # response_type = 3 :: sqrt(max(0, l1) .*max(0, l2));
    # OOF tensor eigenvalues :: l1 >> l2 >> l3
    # normalisation_type: blob-like (0), curvilinear (1), planar (2)
    # sigma: sigma >= min(radii), otherwise normalisation_type = 0
    opts = {'ntype': 1, 'sigma': min(image.GetSpacing()), 'use_absolute': True,
            'radii': radii, 'resp_type': resp_type}

    if min(radii)<opts['sigma'] and opts['ntype']>0:
        print('Normalisation type is set to zero since sigma<min(radii)')
        opts['ntype'] = 0

    # image
    data = sitk.GetArrayFromImage(image)
    size = [image.GetSize()[i] for i in [2,1,0]]
    spacing = [image.GetSpacing()[i] for i in [2,1,0]]

    # output
    output_data = np.zeros_like(data).astype('float64')

    # Fast Fourier Transform
    fft = np.fft.fftn(data)

    # Radius from Fourier coordinates
    x, y, z = ifft_shifted_coord_matrix(size, spacing)
    x /= size[0] * spacing[0]
    y /= size[1] * spacing[1]
    z /= size[2] * spacing[2]
    radius = np.sqrt(x**2 + y**2 + z**2) + 1e-12

 ...... (could not bee shown in discord as hit the character limit)

wooden sail May 14, 2022, 3:29 PM

#

the code seems to imply the image has only 3 axes

versed gulch May 14, 2022, 3:30 PM

#

wooden sail the code seems to imply the image has only 3 axes

which is what Im trying to do with mine

wooden sail May 14, 2022, 3:31 PM

#

and what did you create the image with?

versed gulch May 14, 2022, 3:32 PM

#

wooden sail and what did you create the image with?

originally these images are microscopy images

wooden sail May 14, 2022, 3:32 PM

#

all right. well, sitk seems to be able to make images out of numpy arrays

#

but it anyway seems like you want the image as a numpy array

#

how do you currently have the image stored?

versed gulch May 14, 2022, 3:33 PM

#

yh but I wont be able to do getszie and get spacing

versed gulch May 14, 2022, 3:33 PM

#

wooden sail how do you currently have the image stored?

they're of PNG file format (2D slices of a 3D object)

wooden sail May 14, 2022, 3:34 PM

#

the easiest way, to me, looks like just reading the slices, putting them into a numpy array, and giving that numpy array to sitk

versed gulch May 14, 2022, 3:35 PM

#

yh but I wont be able to do this "GetSize and GetSpacing"

#

thats why Im having a problem

wooden sail May 14, 2022, 3:38 PM

#

hmm?

#

your images don't have that to begin with if they are just a bunch of 2d images in a generic format

#

if you have the measurement parameters as separate metadata, you can put it in yourself

versed gulch May 14, 2022, 3:40 PM

#

hmm okay so Get spacing and Getsize would be meaningless if it wasn't orignally saved as a 3D image file?

wooden sail May 14, 2022, 3:40 PM

#

more like, if that metadata was not included into the image format

versed gulch May 14, 2022, 3:41 PM

#

wooden sail more like, if that metadata was not included into the image format

metadata meaning the size of each pixel for e.g 3 micro metres and the spacing between them?

wooden sail May 14, 2022, 3:41 PM

#

mhm

young granite May 14, 2022, 4:47 PM

#

hi guys i used plotly express to plot 3d scatter plot and now wanted to use plotly.go for a 3d surface plot of the same data.
Sadly when i define x, y and z the same way as i did in the scatter plot i run into a java error. So i looked up the plotly site and there the data is provided as arrays not as df columns. Would someone help me to make it work?

serene scaffold May 14, 2022, 4:49 PM

#

young granite hi guys i used plotly express to plot 3d scatter plot and now wanted to use plot...

try showing the code

young granite May 14, 2022, 4:56 PM

#

serene scaffold try showing the code

df_dict = {}
if group_name not in df_dict:
    df_dict[group_name] = [df_new]
    df_dict[group_name].append(df_new)```
i stored dfs into a dict with the key beeing the group_name

data = {}
for group_name in df_dict:
df_new = pd.concat(df_dict[group_name])

df_new['Area'] = df_new['Area'].fillna(0)

df = df_new.rename(columns={"ID#": "Compound Name:", "Temp": "Temp. [°C]"})

if group_name not in data:
    data[group_name] = df```

i concatenated the dfs of a group to one big from which i plottet

for group_name in data:
    Headline = group_name
    df = data[group_name]
    x = df["Temp. [°C]"]
    z = df["Area"]
    y = df.index```

#

                    x=x,
                    range_y=[1, 42],
                    y=y,
                    z=z,
                    color=name,
                    hover_name=df["display_name"],
                    #log_z=True
                       )```
and then used plotly express to plot a scatter plot.
Now i wanted to do the same thing with the go.Surface

strange stag May 14, 2022, 6:59 PM

#

was hoping someone could tell me whats wrong with this gym environment

import random

from gym import Env
from gym.spaces import Discrete


class ShowerEnv(Env):
    def __init__(self):
        self.action_space = Discrete(3)
        self.observation_space = Discrete(100)
        self.state = 38 + random.randint(-3, 3)
        self.shower_length = 60

    def step(self, action):
        self.state += action - 1
        self.shower_length -= 1

        # Calculating the reward
        if 37 <= self.state <= 39:
            reward = 1
        else:
            reward = -1

        # Checking if shower is done
        if self.shower_length <= 0:
            done = True
        else:
            done = False

        # Setting the placeholder for info
        info = {}

        # Returning the step information
        return self.state, reward, done, info

    def reset(self):
        self.state = 38 + random.randint(-3, 3)
        self.shower_length = 60
        return self.state

#

cause when im using ray rllib, im getting a shape error

ValueError: Cannot feed value of shape (11, 256) for Tensor default_policy/Placeholder_default_policy/default_policy/fc_1/kernel/Adam:0, which has shape (100, 256)

#

here is the ray code: https://bpa.st/6JDQ

steady basalt May 14, 2022, 7:20 PM

#

misty flint <:pithink:652247559909277706>

yeah its not expected

misty flint May 14, 2022, 7:25 PM

#

squiggle has a point with those caveats too

steady basalt May 14, 2022, 7:33 PM

#

any idea for a fix?

#

also, does anyone know how to make another env in conda default?

#

i dont wana use base anymore i wana use my 'ml' env

#

permanently

#

else im gona have to like do some effort to clean up and install tf on base

misty flint May 14, 2022, 8:10 PM

#

steady basalt any idea for a fix?

thats squiggle's expertise. def knows gpu stuffs

steady basalt May 14, 2022, 8:20 PM

#

He has some ideas why but no fix

brazen spire May 14, 2022, 10:49 PM

#

what are some cheap options to get GPUs to run machine learning models?

#

My current GPU (RTX 2080 TI) doesn't have enough VRAM for my models

#

(11 GB)

iron basalt May 15, 2022, 12:11 AM

#

misty flint thats squiggle's expertise. def knows gpu stuffs

The M1 is too closed off and I don't test on it like most.

#

Apple just gives you their ML tools that they want you to use.

spare briar May 15, 2022, 12:55 AM

#

brazen spire what are some cheap options to get GPUs to run machine learning models?

why not aws? if you need cheap try vast.ai

spare briar May 15, 2022, 12:59 AM

#

strange stag cause when im using ray rllib, im getting a shape error ```traceback ValueError:...

your optimizer expects batch size 100 but this batch has only 11 samples

spare briar May 15, 2022, 1:00 AM

#

steady basalt also, does anyone know how to make another env in conda default?

activate env in .bashrc or .zshrc

ripe flare May 15, 2022, 1:03 AM

#

Any scipy expert here?

serene scaffold May 15, 2022, 1:25 AM

#

ripe flare Any scipy expert here?

You should always ask your actual question, not ask if people know about the topic of the question.

strange stag May 15, 2022, 1:26 AM

#

@spare briar mm, could you help me a bit more? how do i increase the sample size?

spare briar May 15, 2022, 1:26 AM

#

strange stag <@336524160509411328> mm, could you help me a bit more? how do i increase the sa...

is your dataset size divisible by 100?

#

what is your dataloader

strange stag May 15, 2022, 1:28 AM

#

60 steps before a reset
not using a dataloader

#

apologies this isnt my code and my ML is really really really rusty

spare briar May 15, 2022, 1:29 AM

#

spare briar is your dataset size divisible by 100?

^

strange stag May 15, 2022, 1:29 AM

#

the dataset size is indefinte tho

#

dictated by the # of epochs

spare briar May 15, 2022, 1:29 AM

#

for some reason a batch has only 11 samples but your optimizer is complaining that it expects 100

strange stag May 15, 2022, 1:30 AM

#

what is telling the optimizer to expect 100?

spare briar May 15, 2022, 1:30 AM

#

my guess is you are iterating over a dataset where dataset % 100 = 11

strange stag May 15, 2022, 1:30 AM

#

think it would be better to keep the batch size the same tho

spare briar May 15, 2022, 1:30 AM

#

strange stag what is telling the optimizer to expect 100?

the batch size is defined as 100

strange stag May 15, 2022, 1:30 AM

#

ah

#

so somewhere in rays default config

spare briar May 15, 2022, 1:31 AM

#

and the optimizer has initialized with a kernel that expects this

misty flint May 15, 2022, 1:34 AM

#

iron basalt Apple just gives you their ML tools that they want you to use.

ofc

#

kekHands

strange stag May 15, 2022, 1:39 AM

#

thanks @spare briar

#

Got another ray rllib question

really got no idea whats happening

Worker crashed during call to step_attempt(). To try to continue training without the failed worker, set ignore_worker_failures=True.

code + env + code traceback ~> https://bpa.st/YOEA

misty flint May 15, 2022, 3:50 AM

#

redshift is basically amazon's version of postgres right?

#

i bring this up because apparently gcp's big query is used more for actual data warehousing

#

and has cooler features for ML apparently

#

PikaThink

orchid carbon May 15, 2022, 3:55 AM

#

UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use zero_division parameter to control this behavior

What is this telling ?

#

Is it bad bad?

potent parrot May 15, 2022, 4:06 AM

#

hidden frigate I'm struggling a bit with handling wheelEvents in pyqtgraph. Does anyone with p...

just saw this, I'm a pyqtgraph maintainer, might be able to help, what's up

worldly dawn May 15, 2022, 4:06 AM

#

misty flint redshift is basically amazon's version of postgres right?

no. redshift is more like presto or trino

misty flint May 15, 2022, 4:12 AM

#

ah i see, i see

#

what am i thinking of? amazon RDS?

#

tbh im still trying to learn the services atm

#

kekHands

worldly dawn May 15, 2022, 4:15 AM

#

misty flint tbh im still trying to learn the services atm

postgres is more like mysql or RDS/aurora

pure badge May 15, 2022, 4:15 AM

#

Hello everyone. I am going to do thesis on machine learning, AI. For that, I should learn Python. I do have basic programming knowledge in both C and Java. Can anyone suggest me the best free tutorial or road map which will be a huge help for my upcoming thesis? Thanks in advance!!

serene scaffold May 15, 2022, 4:27 AM

#

pure badge Hello everyone. I am going to do thesis on machine learning, AI. For that, I sho...

how long is your thesis going to take?

lapis sequoia May 15, 2022, 4:34 AM

#

I fed the whole X matrix and y labels to the cross_val_score function. Does it do the train test split automatically for each fold and calculate the accuracy? If yes then what's the point of using the k-fold generator which returns the indexes for each fold's train and test, if I just want the accuracy.

pure badge May 15, 2022, 4:50 AM

#

serene scaffold how long is your thesis going to take?

1 year. 3 semester each 4 month.

serene scaffold May 15, 2022, 4:50 AM

#

pure badge 1 year. 3 semester each 4 month.

what is it going to be about specifically?

pure badge May 15, 2022, 4:51 AM

#

I haven't selected any specific topic yet. I do have interest in Machine Learning and AI

serene scaffold May 15, 2022, 4:52 AM

#

well, we have this: https://www.pythondiscord.com/resources/?topics=data-science

Python Discord | Resources

We're a large, friendly community focused around the Python programming language. Our community is open to those who wish to learn the language, as well as those looking to help others.

pure badge May 15, 2022, 4:55 AM

#

serene scaffold well, we have this: https://www.pythondiscord.com/resources/?topics=data-science

thank you so much!!

spare briar May 15, 2022, 4:57 AM

#

orchid carbon UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0...

how do you measure precision in a class with no samples

#

this is just telling you it chose to set the scores to 0

serene scaffold May 15, 2022, 4:58 AM

#

spare briar how do you measure precision in a class with no samples

if a tree falls in a forest and Helen Keller is--nvm

lapis sequoia May 15, 2022, 5:15 AM

#

How can I explore a Boolean column. Other than value counts

misty flint May 15, 2022, 5:26 AM

#

this is for a data journalist position...interesting

#

bruh

serene scaffold May 15, 2022, 5:29 AM

#

lapis sequoia How can I explore a Boolean column. Other than value counts

Do those bools correlate with another feature in an interesting way?

lapis sequoia May 15, 2022, 5:30 AM

#

Just wanna do it solo rn. @serene scaffold

serene scaffold May 15, 2022, 5:30 AM

#

lapis sequoia Just wanna do it solo rn. <@253696366952316929>

Not sure what you mean. I don't think there's any inherent virtue in looking at columns of data in isolation.

lapis sequoia May 15, 2022, 5:31 AM

#

There is when you are in school 🤪

#

Well It's just analysis, like finding how data is distributed, outliers, etc.

#

For categorical ones I just wrote the percentage of each unique value.

thin pelican May 15, 2022, 8:26 AM

#

Can the samples you use for accuracy be from the dataset used to train the model?

wooden sail May 15, 2022, 8:27 AM

#

for validation, you mean?

#

if so, it's a bad idea to do that. some recent papers have shown that under relatively mild conditions, the error w.r.t. the training data set will decay to 0, and this tells you nothing about the predictive power when the model is used on new data

winged vessel May 15, 2022, 8:35 AM

#

thin pelican Can the samples you use for accuracy be from the dataset used to train the model...

I wouldn't recommend it as it doesn't show if the model if overfitting and you can't see how well it works on data it's never seen before. Instead use validation_split=0.1 as a parameter in the model.fit function

thin pelican May 15, 2022, 8:36 AM

#

Alright thanks amigos

steady basalt May 15, 2022, 9:14 AM

#

brazen spire My current GPU (RTX 2080 TI) doesn't have enough VRAM for my models

what models

brazen spire May 15, 2022, 9:15 AM

#

Rendering

#

Computer vision

#

that's why I need more than 11 gb of Vram because I'm limited on the size of the batch at 7

#

can't go beyond or else I run out of Vram after 1 epoch and it crashs

strange stag May 15, 2022, 11:00 AM

#

strange stag Got another ray rllib question really got no idea whats happening ``` Worker c...

bump

hasty kiln May 15, 2022, 11:40 AM

#

Which is better for ML applications Django or Flask for machine learning deployments, regardless of the learning curve 😶?

wooden sail May 15, 2022, 11:51 AM

#

hmm are either of those for ML?

strange stag May 15, 2022, 11:56 AM

#

hasty kiln Which is better for ML applications Django or Flask for machine learning deploym...

lol wut, those arent for ML.... they are for webapps n the like

hasty kiln May 15, 2022, 12:22 PM

#

strange stag lol wut, those arent for ML.... they are for webapps n the like

I mean this https://datasciencenerd.com/django-vs-flask-what-works-better-for-machine-learning/

Data Science Nerd

Daisy

Django vs. Flask: What Works Better for Machine Learning? | Data Sc...

Flask is best for beginners while Django is for more advanced machine learning deployments. Flask is a microframework making it more reliant on extensions for functionality. Django is a full-stack web framework. It comes with more ready to access features.

#

I should have made the question clear pithink

orchid carbon May 15, 2022, 2:07 PM

#

spare briar how do you measure precision in a class with no samples

By samples you mean labeled data?

steady basalt May 15, 2022, 2:09 PM

#

@misty flint i must say the laptops great

marble stag May 15, 2022, 2:22 PM

#

Hello i am recently started learning about ml and encountered SVR and i understood the theory behind it but am having a hard time understanding the maths behind it can anyone teach me?

serene scaffold May 15, 2022, 2:33 PM

#

marble stag Hello i am recently started learning about ml and encountered SVR and i understo...

how much do you understand about it currently?

steady basalt May 15, 2022, 2:35 PM

#

start with SVM?

marble stag May 15, 2022, 2:38 PM

#

i know that it creates a tube where error is accepted and all the values outside the tube are support vectors so i wanna know what happens mathamatically . i am not good at maths so i am having trouble understanding.

marble stag May 15, 2022, 2:39 PM

#

steady basalt start with SVM?

i started with linear regression then polynomial regression then came svr

heady rivet May 15, 2022, 3:37 PM

#

Hi im about to go to the collage and they offered me to choose a scholarship for have

#

information systems degree or busines analytics

#

Is there are anyone in the field can help me which one is related more to data science

#

I like to make webscraping scripts and manipulate and play with the data

#

So i want to ask which The speciality is the closest to my interests because I have not been familiar with the field of databases or data analysis
And sorry about the disturbing but it's a pivotal point in my life, and I want to ask, so as not to regret it

rose agate May 15, 2022, 3:45 PM

#

heady rivet Hi im about to go to the collage and they offered me to choose a scholarship for...

Is there a list of classes/units that each degree will take? Or maybe a summary of the degree contents? At my university if you just search the degree you'll find a summary of the degree and the core units that you must take, which should be helpful

heady rivet May 15, 2022, 3:46 PM

#

@rose agatei have the curriculum for both of them can you see them and inform me??

rose agate May 15, 2022, 3:47 PM

#

I don't know if I'd be able to say what's best, but maybe send them in this chat and someone else can try?

opaque estuary May 15, 2022, 3:48 PM

#

heady rivet information systems degree or busines analytics

University courses usually sucks when it come to these. But if you go in depth business analytics will be more useful to you compared to IS.

heady rivet May 15, 2022, 3:49 PM

#

opaque estuary University courses usually sucks when it come to these. But if you go in depth b...

thanks for the advise

#

https://media.discordapp.net/attachments/811302880610222140/811303653105205268/76.gif

opaque estuary May 15, 2022, 3:49 PM

#

I still suggest make your own decision.

mighty spoke May 15, 2022, 3:52 PM

#

Hi I'm trying to bin some values but the data frame has Nan values and the max count in bins is 2, i'm not sure why though any help appreciated```import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from scipy.fft import rfft, rfftfreq
from scipy import fftpack
import scipy.signal as sg
from PyAstronomy.pyasl import foldAt
#method 2 for binning

t=np.arange(0,1.024, 4e-3)
fr=60.546875# fundamental frequency
Tp=1/fr#time period

phases = foldAt(t, Tp, T0=0)

plt.figure()
#data is the intensity

xp, yd = zip(*sorted(zip(phases, data)))#ensures x and y values correspond to each others in pairs when sorted

plt.plot(xp, yd)#plot the phase xp and Flux

df4 = pd.DataFrame({'X' : xp, 'Y' : yd}) #we build a dataframe from the data
M=20#no of bins

bins=np.arange(0, max(phases), step=Tp/M)

categorical_object = pd.cut(xp, bins)
count=pd.value_counts(categorical_object)#count in bins
grp = df4.groupby(by = categorical_object) #we group the data by the cut
ret = grp.aggregate(np.mean)#calculates the mean in each bin
plt.figure()
plt.plot(ret.X, ret.Y)```

hidden frigate May 15, 2022, 4:02 PM

#

potent parrot just saw this, I'm a pyqtgraph maintainer, might be able to help, what's up

awesome, thanks! I kiiiind of solved my problem for now. I was trying to get a, ImageView to scroll through slices of a (2D) matrix with mouse wheel scrolls, but it wasn't calling the wheelEvent function I had defined. I ended up discovering that ImageView either doesn't inherit the wheelEvent function or isn't passed wheelEvent calls, but if the view attribute does, so I can just override the wheelEvent definition for the view attribute. It's still a bit hacky, but it works. I appreciate the offer for help! I'll hit you up if I have more questions; I'm trying to exercise pyqtgraph for a work project, so I'm sure I'll find more bits and bobs.

potent parrot May 15, 2022, 4:03 PM

#

hidden frigate awesome, thanks! I kiiiind of solved my problem for now. I was trying to get a...

that's a clever use of wheelEvent; but yeah not really hacky to monkey-patch in this case, you sort of have to do that, hacky or not 😆

orchid carbon May 15, 2022, 4:05 PM

#

ValueError: `logits` and `labels` must have the same shape, received ((32, 2) vs (32, 1)). What's this about?

#

What''s a logit

misty flint May 15, 2022, 4:28 PM

#

steady basalt <@446424248479645706> i must say the laptops great

damn stop making me jealous

#

kekHands

serene scaffold May 15, 2022, 4:52 PM

#

orchid carbon ```ValueError: `logits` and `labels` must have the same shape, received ((32, 2)...

can you show the code related to the error?

orchid carbon May 15, 2022, 4:53 PM

#

Well it's working now, it was the loss function I had used

#

it's pretty big lemme try a site paster

#

https://paste.ofcode.org/pTeNsSgjJgJpBFpVFGTw3G

#

but I was using loss="binary_crossentropy"

strange stag May 15, 2022, 4:59 PM

#

@hasty kiln go with django

versed gulch May 15, 2022, 5:14 PM

#

Does anyone know how to add 2 gray scale images as numpy arrays (that are between 0-1) i.e. just overlaying the images over each other?

wooden sail May 15, 2022, 5:18 PM

#

if they're already numpy arrays, just do +. they need to have the same or broadcastable sizes though

versed gulch May 15, 2022, 5:21 PM

#

wooden sail if they're already numpy arrays, just do +. they need to have the same or broadc...

hmm but i get unwanted grey pixels that shouldnt be there

hidden frigate May 15, 2022, 5:23 PM

#

orchid carbon ```ValueError: `logits` and `labels` must have the same shape, received ((32, 2)...

logit is a statistics term; practically, in this case it probably means the unnormalized probabilities of classes in a classifier. The function you're calling is trying to compare an array of probability predictions to the correct labels and it's an element-wise comparison, so the arrays need to be the same shape. It look like the logits you're passing aren't a single number per prediction, but two numbers (the second dimension of the array shape is 2) while the labels only have 1. Check the output of whatever neural network layer or classifier you're coding; if the labels are shape (31,1) the logits have to be (32,1) also

wooden sail May 15, 2022, 5:35 PM

#

versed gulch hmm but i get unwanted grey pixels that shouldnt be there

wdym they shouldn't be there?

hasty kiln May 15, 2022, 5:48 PM

#

strange stag <@830227617289601025> go with django

I finally understand what I mean 😂😂😂, I will probably go with Django

oblique drum May 15, 2022, 5:58 PM

#

How much stuff/programming do i need to program a website/app that detects if something is an apple or not

misty flint May 15, 2022, 6:17 PM

#

any of the major clouds have APIs that can do object recognition like that

#

if you dont want to build your own

#

otherwise you could probably build your own model with ImageNet

rose quarry May 15, 2022, 6:20 PM

#

How can I replace all values of nan with 0?

#

atm Im using this CNN_values2 = [[n if n!=np.nan else 0 for n in k[:4096]] for k in data_array2] but it still gives me nan values

vital ruin May 15, 2022, 6:52 PM

#

I am working with an xml file that I am parsing to a dict with xmltodict. Is there a way to use a list to call keys to sheet the info I am wanting out of this dict. I.E. data[list] instead of data[key1][0][key2]

#

Get*

wooden sail May 15, 2022, 6:59 PM

#

rose quarry atm Im using this `CNN_values2 = [[n if n!=np.nan else 0 for n in k[:4096]] for ...

one of the properties of np.nan is that it is not equal to anything, including nan. you should instead use numpy's isnan function, like so:

In [1]: import numpy as np

In [2]: x = np.nan

In [3]: x
Out[3]: nan

In [4]: x == np.nan
Out[4]: False

In [5]: np.isnan(x)
Out[5]: True

rose quarry May 15, 2022, 7:03 PM

#

wooden sail one of the properties of np.nan is that it is not equal to anything, including n...

Thank you very much!

gloomy anvil May 15, 2022, 9:06 PM

#

I have the stupidest issue:

boxplot_df = pd.DataFrame()
for currency in currencies:
    sub_df = evaluation_df.query(f'{currency}'.format(currency=currency))
    boxplot_df [f'{currency}_allin'.format(currency=currency)] = sub_df['allin_trades'].copy()

I have an empty boxplot_df and want to add the column 'allin_trades' from the evaluation_df, if the string in the column 'currency' matches the currency i am currently looping through

#

If I run this code, I have the expected result for bitcoin, which is my first currency in the list, but for all following currencies I receive a 'nan' even though I can clearly see the populated column in the queried sub_df
What is the issue here? There must be some stupid but that I am missing

serene scaffold May 15, 2022, 9:33 PM

#

@gloomy anvil you're using f strings and .format at the same time, which is wrong.

You can't selectively add columns. Every row always has an element for every column, and vice versa.

Is there any data transformation going on here? Or are you just copying certain columns from one df to another?

#

Because you don't want to allocate empty DataFrames and then copy stuff into it. If you want a new df that contains copies of the columns in a given df, you just do df[['a', 'b', 'c']] and what you get are copies.

gloomy anvil May 15, 2022, 9:35 PM

#

gloomy anvil May 15, 2022, 9:36 PM

#

serene scaffold <@803185107547586600> you're using f strings and `.format` at the same time, whi...

thanks! I fixed the format as you can see in the screenshot at the bottom

serene scaffold May 15, 2022, 9:36 PM

#

Can you show evaluation_df?

#

Also I'm on my phone, so my explanations will be high level

gloomy anvil May 15, 2022, 9:37 PM

#

evaluation_df

#

it also has a column at the end called 'allin_trades'

#

the query part works fine as well. I can create the sub_df based on the currency and I can see the populated 'allin_trades' column

#

but somehow it does not copy to the boxplot_df... is this some stupid typo i might have? I am loosing my mind about this simple stupid issue

serene scaffold May 15, 2022, 9:40 PM

#

@gloomy anvil look into how to pivot a DataFrames. Because that's what you're actually trying to do.

gloomy anvil May 15, 2022, 9:41 PM

#

serene scaffold <@803185107547586600> look into how to pivot a DataFrames. Because that's what y...

will do! but is there any reason why the last line boxplot_df[f'{currency}_allin'] = sub_df['allin_trades'].copy() does not work?

serene scaffold May 15, 2022, 9:41 PM

#

gloomy anvil will do! but is there any reason why the last line boxplot_df[f'{currency}_a...

That's not how pandas works. You don't allocate empty space and put stuff into it later

#

Also, what is each row supposed to represent?

gloomy anvil May 15, 2022, 9:44 PM

#

each row should represent the return of different classification models for each currency. I want to create a boxplot for each column/currency

serene scaffold May 15, 2022, 9:45 PM

#

@gloomy anvil there are columns like TN, FP-- are these supposed to be the rows in the desired df?

gloomy anvil May 15, 2022, 9:51 PM

#

serene scaffold <@803185107547586600> there are columns like TN, FP-- are these supposed to be t...

No, I have a evaluation_df with all currencies, models and evaluation data like True positives (TP, FP); and so on, as well as their return ('allin_trades'), respectively. I query the evaluation_df by currency and create a sub_df with all the data from the evaluation_df for the currency, that I queried. From the sub_df I want to take only the column 'allin_trades' and copy it to the boxplot_df into a new column with the currencies name

gloomy anvil May 15, 2022, 9:53 PM

#

serene scaffold <@803185107547586600> there are columns like TN, FP-- are these supposed to be t...

and by the way: thank you for your time and effort to help me here. I feel so lost and I am very grateful for that

#

figured it out

#

i don't know why but this here works: all fields populated

#

#

I just added reset_index(drop=True).

#

saw it at stackoverflow, copy, pasted, works. Don't know what it does or why it works though

gloomy anvil May 15, 2022, 9:59 PM

#

serene scaffold <@803185107547586600> there are columns like TN, FP-- are these supposed to be t...

thanks again for your help and your time. very appreciated

royal hound May 15, 2022, 10:45 PM

#

hello fellas im tryna generate a image using perlin noise

#

where the white spots would be grey colored and the black spots would be transparent
(0,0,0,0)

#

I have tried countless efforts

#

but everything is slow

#

atleast 1 minute

plush glacier May 15, 2022, 10:47 PM

#

royal hound but everything is slow

are you using library's like numpy and what resolution images are you generating

mellow vapor May 15, 2022, 10:49 PM

#

how often will I have to write sql queries to retrieve data from databases, I mean we do have pandas and I can work on it to perform the necessary changes
I am familiar with the sql queries but do I need to have extremely solid grip on it or is it fine if I work on the data using libraries like pandas?

royal hound May 15, 2022, 10:49 PM

#

plush glacier are you using library's like numpy and what resolution images are you generating

no im using some noise module from pypi and the resolution is 2048x2048

plush glacier May 15, 2022, 10:51 PM

#

you can try making some perlin noise in numpy might be a bit faster

#

also how does the code look there might be other ways to improve the speed a bit

#

and is it possible to find out what parts take long (also i wont be available to help a lot because will go to bed now it is like 1am for me)

royal hound May 15, 2022, 11:39 PM

#

plush glacier you can try making some perlin noise in numpy might be a bit faster

im trying this currently

royal hound May 15, 2022, 11:39 PM

#

plush glacier and is it possible to find out what parts take long (also i wont be available to...

we on the same timezone NOOO

#

im currently getting this

#

#

https://paste.pythondiscord.com/ifigejitep

marsh yacht May 16, 2022, 12:07 AM

#

hello guys

#

im looking for a data scientist in my team

#

im currently working on a game project and its related to AI and machine learning

#

im also a data scientist

#

but i need your helop

#

if you're interested just pm me

serene scaffold May 16, 2022, 12:55 AM

#

@marsh yacht please post the GitHub for the project in the chat. If this is a closed source project, kindly remove your messages

brave sand May 16, 2022, 1:33 AM

#

serene scaffold <@706728092949020713> please post the GitHub for the project in the chat. If thi...

do u have any knowledge in deep learning?

lapis sequoia May 16, 2022, 2:03 AM

#

marsh yacht if you're interested just pm me

mad sus buddy

serene scaffold May 16, 2022, 2:03 AM

#

brave sand do u have any knowledge in deep learning?

why do you ask?

brave sand May 16, 2022, 2:04 AM

#

serene scaffold why do you ask?

I got an internship for deep learning and I’m kinda screwed lol. just asking if it’s possible to learn the basics in 24 hours

serene scaffold May 16, 2022, 2:05 AM

#

brave sand I got an internship for deep learning and I’m kinda screwed lol. just asking if...

absolutely not.

#

but if it's an internship, they don't necessarily expect you to know.

brave sand May 16, 2022, 2:05 AM

#

could I learn the basics of PyTorch in 24 hours?

serene scaffold May 16, 2022, 2:05 AM

#

probably not.

brave sand May 16, 2022, 2:05 AM

#

could I DM you a picture of the email he sent? I’m not sure what the interview is asking for

serene scaffold May 16, 2022, 2:06 AM

#

so you haven't been offered the internship? you just have an interview for it?

brave sand May 16, 2022, 2:06 AM

#

yeah, it’s weird like that

#

he wants to interview me

#

and I’m kinda screwed

serene scaffold May 16, 2022, 2:06 AM

#

being interviewed for positions is normal. but why don't you copy and paste the email into this chat, with sensitive parts censored out?

brave sand May 16, 2022, 2:06 AM

#

alright

#

The Intro slides to MARL can be found here. You may be interested in this blog post which gives an exciting overview into the future of Artificial Intelligence with Multi-agent Reinforcement Learning. A summary of MARL is attached as a Word document along with relevant papers in MARL.

This summer, 4 students (1 UMD CS Undergrad, 1 Virginia Tech CS Undergrad, 2 Blair HS Junior students) will be working with me on MARL.

Working on the project will involve coding in python, pytorch and a background mathematical knowledge specifically in calculus, probability and optimization. All 4 students currently in my team are prepared in these areas so I will directly start introducing Reinforcement Learning from May 23 before the actual project in Multi-Agent Reinforcement Learning begins from May 30.

This position is unpaid. No Professor will be involved in this Summer research project. So if that's something that you are not interested in, please let me know immediately.

If you are interested, please meet me tomorrow at 7pm EST for a 20 minute time slot on Zoom. I will evaluate your Python programming skills, background Mathematical knowledge, and willingness to learn new things quickly. In case you qualify for the position, I will let you know immediately. Then, please let me know about your interest in the position by May 17.

serene scaffold May 16, 2022, 2:08 AM

#

brave sand The Intro slides to MARL can be found here. You may be interested in this blog p...

if the position isn't paid, I wouldn't even bother

brave sand May 16, 2022, 2:08 AM

#

oh I’m in high school, all internships are non paid practically. Getting an internship is lucky enough…

#

internships for me right now are for experience and resume

serene scaffold May 16, 2022, 2:09 AM

#

I suppose. but once you're in college/university, don't waste your time on unpaid internships.

if you don't already have "a background mathematical knowledge specifically in calculus, probability and optimization", there is no way you can learn enough in 24 hours to become a more competitive applicant for this position than you are currently.

brave sand May 16, 2022, 2:10 AM

#

yeah, I’m terrible at calc, and lin alg. should I just give up?

serene scaffold May 16, 2022, 2:10 AM

#

you can still continue with the process for the interview experience. and who knows, maybe something interesting will happen.

brave sand May 16, 2022, 2:11 AM

#

yeah, hopefully. do you think the python testing part will be difficult?

#

I’m terrible at programming under a time limit or stress

serene scaffold May 16, 2022, 2:12 AM

#

if you already know Python basics, it's unlikely that they'd ask you anything particularly esoteric. but I don't think you can really learn how to use pytorch if you don't understand deep learning in general

brave sand May 16, 2022, 2:12 AM

#

yeah, gotcha. is it inappropriate to ask programming questions in a channel during the interview?

serene scaffold May 16, 2022, 2:13 AM

#

I suppose that counts as cheating, but if you're in a Zoom interview, you can't just pause everything while you type out your question in Discord

#

and if you don't know the answers to their questions, just tell them what you do know. if you're squirming, they'll probably move on to a different question.

brave sand May 16, 2022, 2:17 AM

#

yeah, I understand

#

jeez this is tough

brave sand May 16, 2022, 2:21 AM

#

serene scaffold I suppose that counts as cheating, but if you're in a Zoom interview, you can't ...

how do zoom interviews work?

orchid carbon May 16, 2022, 2:22 AM

#

I added a second Dense layer with 64 neurons and softmax activation, apparently it makes my net give the same predict results to all predicition

#

It's quite a simple problem, the simpler the less neurons required I suppose?

serene scaffold May 16, 2022, 2:26 AM

#

brave sand how do zoom interviews work?

you have your webcam on and you talk to them like you're sitting at a table. it's basically the same as in-person interviews, except you secretly don't wear pants, and everyone secretly knows it.

brave sand May 16, 2022, 2:27 AM

#

serene scaffold you have your webcam on and you talk to them like you're sitting at a table. it'...

holy do they actually know I don’t wear pants??

#

jkjk

misty flint May 16, 2022, 3:13 AM

#

they will know if you cheat lol

#

but anyway if youre in highschool and its unpaid, cant hurt to try for the interview

#

high schoolers have plenty of time anyways lol

#

might as well spend your summer doing something productive/educational

serene scaffold May 16, 2022, 3:23 AM

#

misty flint high schoolers have plenty of time anyways lol

in general, or just over the summer? because I think the amount of time HS students spend at school, plus commuting to school and back, plus homework, plus the kajillion extracurriculars that kids are pressured to participate in, is very unreasonable.

royal crest May 16, 2022, 5:20 AM

#

what would be a good way to visualise co-occurrence matrix? i'm thinking something like a graph network visualisation where the co-occurrence value determines the thickness of the line between two nodes but i don't think there are good ways to integrate with a pandas dataframe

#

one i'm looking at is holoviews, but it seems way too complicated for my purpose

#

I'm thinking along the lines of R's igraph

wooden sail May 16, 2022, 5:35 AM

#

a common approach is to just plot the matrix itself as an image

royal crest May 16, 2022, 5:39 AM

#

wooden sail a common approach is to just plot the matrix itself as an image

I know it's an option, but in the field something like this is more common

#

https://media.springernature.com/lw685/springer-static/image/art%3A10.1007%2Fs11412-021-09349-3/MediaObjects/11412_2021_9349_Fig4_HTML.png

pliant pewter May 16, 2022, 6:31 AM

#

Is there a canonical graduate level textbook for data science and AI/ML?

lapis sequoia May 16, 2022, 7:15 AM

#

what would be a good place to draw some 2d graphs? like I want to have some vectors over a graph, and then show possible vectors having norm of one.

royal crest May 16, 2022, 7:46 AM

#

royal crest what would be a good way to visualise co-occurrence matrix? i'm thinking somethi...

I managed to achieve it by using python-igraph! The documentation is quite poor, but it seems to integrate very well with pandas.

zenith panther May 16, 2022, 11:16 AM

#

hello can anyone figure out where is the issue in this code ?

#

"ConnectionError: HTTPSConnectionPool(host='www.mubawab.tnhttps', port=443): Max retries exceeded with url: //www.mubawab.tn/en/a/7429900/apartment-for-sale-in-les-jardins-de-carthage-2-rooms-reinforced-door-and-double-glazing- (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000002357B8BB6D0>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed')) " this is the error that i get

smoky shadow May 16, 2022, 11:56 AM

#

Can someone just tell me for this code:

#

def speak(audio):
path = r"D:\Python\Jarvis\Voice_Files\Voice"
directories = os.listdir(path)

    # This would print all the files and directories
    for file in directories:
            filename = audio.replace(" ", "_")
            file_path = (rf"{path}\\{filename}"+".mp3")
            playsound(file_path)

#

why does the mp3 file keeps on repeating?

misty flint May 16, 2022, 12:40 PM

#

serene scaffold in general, or just over the summer? because I think the amount of time HS stude...

the ones that dont do extracurriculars have all the time in the world kekHands

brave sand May 16, 2022, 12:41 PM

#

that’s true

misty flint May 16, 2022, 12:41 PM

#

if youre serious about ML Engineering, this just came out from one of my favorite ML Eng

image_6e871f50-58b9-41c1-9ccb-ab2105d75f0020220516_072505.jpg

brave sand May 16, 2022, 12:42 PM

#

so what do you think the professor is goin you ask me math and programming wise?

misty flint May 16, 2022, 12:44 PM

#

pliant pewter Is there a canonical graduate level textbook for data science and AI/ML?

some would say artificial intelligence by russell and norvig. pattern recognition by bishop is another common one. they dont really cover data science per se. more statistics. the former has more newer ML methods including CV, NLP, etc., while the latter has more classical ML

#

bit dense if you arent graduate level so..you have been warned

#

kekHands

brave sand May 16, 2022, 1:11 PM

#

misty flint bit dense if you arent graduate level so..you have been warned

how complex could the programming prompt be if he’s not expecting much?

spare briar May 16, 2022, 1:35 PM

#

Both of those books are 2nd year undergrad level (not to knock them, Bishop’s book might be my favorite textbook). For grad level content, if you let me know a more specific subfield you are interested in i might have recs

misty flint May 16, 2022, 1:40 PM

#

spare briar Both of those books are 2nd year undergrad level (not to knock them, Bishop’s bo...

i guess thats fair. but they are canonical imo

#

kekHands

pliant pewter May 16, 2022, 1:42 PM

#

They sound like a good starting point, I kind of need something general before getting into specifics

#

Are there any good books specifically about Python tools for ML (and related)? I've used Pandas a tiny bit

#

But would like to learn stuff like TensorFlow

spare briar May 16, 2022, 1:47 PM

#

misty flint <:kekHands:948697940711587900>

yes definitely must read bishop

misty flint May 16, 2022, 2:05 PM

#

pliant pewter But would like to learn stuff like TensorFlow

youre looking for more o'reilly books then; those tend to be more practical than theoretical

#

but like anokhi said

#

def check out bishop

#

for sure

arctic wedgeBOT May 16, 2022, 2:18 PM

#

Hey @mighty spoke!

You either uploaded a .txt file or entered a message that was too long. Please use our paste bin instead.

green wasp May 16, 2022, 2:18 PM

#

Helo GVbearwiggle I’ve checked the pins. Is it right to say that machine learning is a step up from data analytics? I try and find books about data analysis and I get ML books think

versed gulch May 16, 2022, 2:19 PM

#

Hi guys i have 2 arrays (one of them is just filled with 0's and 1's lets say this is arr2) after adding them i.e. arr1 +arr2, I only want to divide by 2 on those elements in arr1 that have been changed, can anyone help me with this?

wooden sail May 16, 2022, 2:22 PM

#

if using numpy array, you can do something like

...
my_sum = arr1 + arr2
indices = my_sum == arr1
my_sum[indices] = my_sum[indices]/2
...

misty flint May 16, 2022, 2:30 PM

#

green wasp Helo <a:GVbearwiggle:790353237739831327> I’ve checked the pins. Is it right to s...

there is a decent amount of overlap. but data analysis tends to be more descriptive analysis (looking at the past) while classical ML tends to be predictive or even prescriptive analysis (looking at the future)

#

you are better off looking at statistics books for data analysis, since that would lay a good foundation i believe

green wasp May 16, 2022, 2:32 PM

#

Mhh think ok so I guess like. You need both?

#

Like I should have asked my question better. Is data analysis as a role more machine learning based, data analysis based or both?

wooden sail May 16, 2022, 2:33 PM

#

it's kinda difficult to make a clear-cut distinction, since these are buzz words that grew out of something else

#

to the extent that what "data analysis" and "machine learning" means depends on who you ask. if you have a job in mind, you can see what they are asking for. if not, it's kinda hard to say

#

e.g. you could do data analysis to prep data for machine learning. or, you could do machine learning as part of your data analysis to extract some useful info

#

you could put both under estimation theory, which falls under statistics... along with an overlap with other topics here and there

green wasp May 16, 2022, 2:37 PM

#

Mmmmm I see. I thought the field would be more like. Idk. Intertwined?

#

Like ok practical example. My per project

#

I’m fetching live flight data and I want to make some visualization dashboards like total distance flown in a day, total time, fuel estimation, how many aircrafts of a given type

#

Which airports are more popular, which less so etc erc

#

What does that fall under

wooden sail May 16, 2022, 2:39 PM

#

depends what your boss calls it 😛 possibly data analysis?

#

again, the words are made up and misused, but it's likely you'll find it under "data analysis"

green wasp May 16, 2022, 2:42 PM

#

I don’t have a boss I’m doing it on my spare time

#

I’m a sysadmin PensiveCowboy

#

Like would that project even have a ML aspect? I can’t think of any think idk

wooden sail May 16, 2022, 2:46 PM

#

depends on how you wanna find the quantities you mentioned. distance flown, time, fuel estimation (!! this one, especially, since estimation is one of ML's applications), popularity (!! here too if you wanna use a fancy definition of popularity other than just counting all flights. if you have info from several disjoint time periods, for example, you might wanna try and predict the behavior in the times you didn't observe and then do some weighted sum or something, or directly estimate a popularity metric that describes the observed data)

green wasp May 16, 2022, 3:07 PM

#

Don’t have information from other time periods :/ but I could like

#

Ingest daily and predict the next day?

#

Or after a while

#

Like maybe live flight data isn’t the best to work with. Idk. Lots of updates for little information

#

Like it updates every 5 seconds, and all that changes usually is altitude and position, which I guess is what matters tbh? But also I’m not sure what to do with it

#

What could I do actually? I have the idwa for the project but I’m now realizing i have a bunch of useless data that I’m collecting

wooden sail May 16, 2022, 3:31 PM

#

idk, maybe you don't need ML. you could still find interesting statistics and cool ways of visualizing and interpreting them

#

or maybe you could see of some of the values are good predictors for the others if you find that interesting. that could let you do supervised training

green wasp May 16, 2022, 3:35 PM

#

Mmmmm yeah but my main gripe rn is that I have a lot of data that means nothing

#

The live api is updated every 5/10 seconds and it’s great for studying and watching the planes on the map but from a data analysis/ml standpoint having that much data just adds noise I guess?

#

I’m not really sure. I’ve never done this and I’m making it up as I go, and I just now realized how much data I get and how little it changes in a short period of time

wooden sail May 16, 2022, 3:50 PM

#

that's a cool ML task on its own, you know? data interpolation. you use the super frequently gathered data to train so that, later, your network can take much fewer samples and fill in what is missing

#

then you can replace API calls with inference

green wasp May 16, 2022, 3:51 PM

#

👀

wooden sail May 16, 2022, 3:51 PM

#

or alternatively, instead of filling in missing samples using the API infrequently, you could still use it frequently and exploit those slow changes to predict the future samples for some time window

#

those are both cool applications

#

that the data changes slowly means it is structured, and that makes it a good candidate for inter and extrapolation

green wasp May 16, 2022, 3:52 PM

#

wah

#

So like I could take the position of aircraft X

#

Train the model to predict where that will go based on waypoints in the area, the flight path and the speed at which is traveling?

#

And see if I can predict the correct flight path of the aircraft

wooden sail May 16, 2022, 3:53 PM

#

sure

#

you know how they do those forecasts for hurricanes and tornadoes?

#

where they show current radar measurements and the current position of a storm

#

and a cone of probability where they expect it to be in the next hours/days

green wasp May 16, 2022, 3:54 PM

#

Mmmm

#

Something similar?

#

Or what else? Hit me with ideas, I’ve never done this before so I don’t know what questions to ask the data think

wooden sail May 16, 2022, 3:54 PM

#

you could do that probably only with the location and speed. and with the other info you have, possibly predict which airport it's headed to, with a probability updated on the fly

green wasp May 16, 2022, 3:55 PM

#

wooden sail you could do that probably only with the location and speed. and with the other ...

Yah!

#

Well not super useful since the data contains the destination airport

#

But that’s just an easy test tbh!

wooden sail May 16, 2022, 3:55 PM

#

sure, but if you're just doing this to play around, it's a nice experiment

#

it could fail, sure. then you test something else

green wasp May 16, 2022, 3:55 PM

#

yay

#

Yeah!

wooden sail May 16, 2022, 3:56 PM

#

maybe you'll get some other idea as you're setting it all up. gotta get your hands dirty at some point

green wasp May 16, 2022, 3:56 PM

#

Possibly

#

An idea I had is try to correlate airport destinations with holidays/festivities and weather

#

Like pick airport x. What’s the probability that that’s the diversion and the original airport is airport Y which has bad weather

#

Conversely, if that airport has bad weather what could be the diversion

#

Not sure how to go about it though

misty flint May 16, 2022, 4:03 PM

#

i think the more you work with different datasets, the more you develop your data intuition aka what questions you can ask the data vs. getting stuck

#

kekHands

green wasp May 16, 2022, 4:17 PM

#

Yeah laughing

#

I guess it’s also not very smart to plunge right into a big project like this but I’m confident that with the help of the discord server, some research and books I can at get something working

#

Oh yeah forgor to ask! What books could help me with this project?

wooden forge May 16, 2022, 5:40 PM

#

Hey there, I'm wondering if there is a way to get the color of a histogram plot with Matplotlib that I already ploted ?

long locust May 16, 2022, 6:04 PM

#

wooden forge Hey there, I'm wondering if there is a way to get the **color** of a **histogram...

I think the default is viridis

mint palm May 16, 2022, 6:05 PM

#

why is it not advised to have val set when dataset is too small??

misty mulch May 16, 2022, 6:50 PM

#

im trying to get into reinforcement learning, where should I begin?

serene scaffold May 16, 2022, 6:59 PM

#

misty mulch im trying to get into reinforcement learning, where should I begin?

what do you currently know about reinforcement learning, and ML in general?

misty mulch May 16, 2022, 7:03 PM

#

nothing rlly, was starting to get into it yesterday and learning Linear regression

main gorge May 16, 2022, 7:29 PM

#

Does anyone know how we can fetch schema from parquet files using python. And then create schema for hive external table?

#

Please help I am totally out of context and really need help in this.

#

Please help 🙏

#

I am searching this over Google from past one week

frigid elk May 16, 2022, 7:32 PM

#

a quick google shows

from pyarrow.parquet import ParquetFile
ParquetFile(source).metadata

main gorge May 16, 2022, 7:32 PM

#

Source file is on S3

#

Can I read parquet using this library?

#

And I need to connect to Hive using Pyspark and need to query

#

Does anyone know how to query hive on my local machine?

frigid elk May 16, 2022, 7:35 PM

#

import pyarrow.parquet as pq
import s3fs
s3 = s3fs.S3FileSystem()

pandas_dataframe = pq.ParquetDataset('s3://your-bucket/', filesystem=s3).read_pandas().to_pandas()

per https://stackoverflow.com/a/48809552/1538838

Stack Overflow

How to read a list of parquet files from S3 as a pandas dataframe u...

I have a hacky way of achieving this using boto3 (1.4.4), pyarrow (0.4.1) and pandas (0.20.3).

First, I can read a single parquet file locally like this:

import pyarrow.parquet as pq

path = 'par...

main gorge May 16, 2022, 7:37 PM

#

How can I create hive schema according from dataframe i receive

#

I am able to fetch dataframe but i am unable to automate how can I create hive structure accordingly through this dataframe

#

I really need help creating hive structure accordingly the parquet dataframe which i receive in any datatype. Be it Pyspark dataframe or pandas dataframe

digital lynx May 16, 2022, 7:40 PM

#

#

anyone know how to remove the dots

#

df = pd.read_csv("datasets/covid19cases_test.csv")
sonoma_df = df[df["area"] == "Sonoma"]

print(sonoma_df)

dates = pd.to_datetime(sonoma_df['date'])
cases = sonoma_df['cases']

#plt.style.use('seaborn')
plt.plot_date(dates, cases, linestyle='solid', linewidth=0.5)
plt.gcf().autofmt_xdate()

plt.title('Sonoma County Cases vs Time')
plt.xlabel('Date')
plt.ylabel('Cases')

plt.tight_layout()

plt.savefig('figures/sonoma_cases.png')

#

nvm i got it

main gorge May 16, 2022, 7:42 PM

#

main gorge How can I create hive schema according from dataframe i receive

I am still waiting for your input

frigid elk May 16, 2022, 7:43 PM

#

main gorge I am still waiting for your input

this may be of value.. https://stackoverflow.com/a/34344654/1538838

Stack Overflow

Can we load Parquet file into Hive directly?

I know we can load parquet file using Spark SQL and using Impala but wondering if we can do the same using Hive. I have been reading many articles but I am still confused.

Simply put, I have a pa...

#

anybody out here using palantir foundry cloud? curious what your workflow looks like for development, .. prior to racking up the bills in production

echo vigil May 16, 2022, 8:19 PM

#

How do you guys handle code review / PRs for jupyter notebooks? Since notebooks are painful to read in plain text.

pliant pewter May 16, 2022, 8:21 PM

#

VS Code does syntax highlighting in notebooks, this is my go-to way to work with notebooks now

echo vigil May 16, 2022, 8:23 PM

#

Ah ty, I've never used VS Code, can it show diffs?

pliant pewter May 16, 2022, 8:29 PM

#

probably, I've only explored a fraction of its features.

#

Diffs in Jupyter notebooks, though? I'm not sure

frigid elk May 16, 2022, 8:30 PM

#

echo vigil Ah ty, I've never used VS Code, can it show diffs?

yes, there is tight integration with git as well, ... you can show diffs between different files (in explorer) or diffs of different commits

pliant pewter May 16, 2022, 8:32 PM

#

Note however: this is after installing loads of plugins. I don't even remember what I've installed anymore, but fortunately there's a feature to transport the list from one installation to another.

echo vigil May 16, 2022, 8:32 PM

#

thanks both!

tidal bough May 16, 2022, 9:25 PM

#

VSCode does show diffs of jupyter notebooks, but badly: it doesn't exclude cells that have no diff.

#

So you have to scroll to find the cells that actually have changes.

#

E.g:

#

This is only with the Python extension and the stuff that comes with it like the Jupyter extension.

spare briar May 16, 2022, 10:06 PM

#

misty mulch nothing rlly, was starting to get into it yesterday and learning Linear regressi...

read the bible http://incompleteideas.net/book/the-book-2nd.html

desert oar May 16, 2022, 10:34 PM

#

spare briar read the bible http://incompleteideas.net/book/the-book-2nd.html

there's no site here

#

oh nvm i was doing https

delicate apex May 16, 2022, 11:12 PM

#

I've had a very persistant bug with Pandas for a while now, and I'm curious if any of you have a possible solution. Note that both of the problems here require pandas 1.4.0rc0 or greater, as pandas 1.3.5 works completely fine. I was hoping that the update to 1.4.2 back in April might quash these bugs, but it didn't.
Have confirmed existance of bugs on a separate computer.

import pandas as pd

df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
dfsty = df.style
dfsty.applymap(lambda _: 'color: red')

with open('kaboom.html', 'w') as f:
    # version 1 - fails
    dfsty.to_html(f)  # regression: breaks with ValueError

    # version 2 - incorrect
    # f.write('bye bye style tag\n')  # mangles css style below
    # dfsty.to_html('kaboom.html')  # bad css due to above

    # version 3 - works, but deprecated
    # f.write(dfsty.render())

produces (redacted for brevity, can post full traceback and details of css bug if requested)

  File "C:\Users\Thurisatic\miniconda3\lib\site-packages\pandas\io\formats\format.py", line 1220, in get_buffer
    raise ValueError("buf is not a file name and encoding is specified.")
ValueError: buf is not a file name and encoding is specified.

iron peak May 17, 2022, 12:06 AM

#

Anyone online? I would like to ask for a favor.

desert oar May 17, 2022, 1:56 AM

#

delicate apex I've had a very persistant bug with Pandas for a while now, and I'm curious if a...

this is possibly a bug in pandas' handling of optional arguments. i would file a bug report. my guess is that the default encoding= argument changed, but they forgot to check for the case when the output isn't a filename. try encoding=None explicitly in to_html, as a test / workaround

serene scaffold May 17, 2022, 2:52 AM

#

iron peak Anyone online? I would like to ask for a favor.

you have to say what it is that you need help with.

iron peak May 17, 2022, 2:54 AM

#

serene scaffold you have to say what it is that you need help with.

Creating mock data for a quiz.

barren wedge May 17, 2022, 3:41 AM

#

What is embeddings in pytorch?

mellow vortex May 17, 2022, 3:44 AM

#

how can i graph kinematic equations in matplotlib?

#

or how can i structure a kinematic equation to be a graphable function?

tacit basin May 17, 2022, 5:04 AM

#

misty mulch im trying to get into reinforcement learning, where should I begin?

This course is live now https://github.com/huggingface/deep-rl-class

GitHub

GitHub - huggingface/deep-rl-class: This repo contain the syllabus ...

This repo contain the syllabus of the Hugging Face Deep Reinforcement Learning Class. - GitHub - huggingface/deep-rl-class: This repo contain the syllabus of the Hugging Face Deep Reinforcement Lea...

brisk nest May 17, 2022, 5:18 AM

#

Hi guy, I've been doing a comparison of the four models LiR RR LASSO and EN. Am I wrong to assume that if one metric is the best resulting one then the other metrices should also be the best performing one? Because in the image below EN is the best performing for MAE but for MSE and R2 it is LASSO. Am I doing something wrong here?

wooden sail May 17, 2022, 5:19 AM

#

that is indeed wrong to assume

brisk nest May 17, 2022, 5:20 AM

#

Interesting, can you please elaborate?

wooden sail May 17, 2022, 5:21 AM

#

each model will have its own properties, that's about it

#

that's the whole reason there are different models and different metrics. you have to try and find the best one for your use case. it goes deeper, too: you can read about the no free lunch theorems of estimation. all estimators (what you use to fit the chosen model) have the same performance on average. this means if they are good in one scenario, they are necessarily bad in others.