#data-science-and-ml | Python | Page 299

pine vapor Mar 22, 2021, 11:25 AM

#

Question about method chaining with pandas, when I want to refer to a column created as part of the method within the chain, that will not exist as it all refers to the initial dataframe. How do I best handle this? Do I have to start another chain?

solemn hinge Mar 22, 2021, 11:27 AM

#

offtopic: srry if wrong channel. i just created a ai/ml starter pack here: https://www.reddit.com/r/ProgrammerHumor/comments/mal4eb/aiml_developer_starter_pack/

uneven gust Mar 22, 2021, 12:24 PM

#

hey can someone help me please?

#

i'm workin on a project in R

#

does anyone know R?

candid sable Mar 22, 2021, 1:16 PM

#

hi guys - I'm getting

        y_pred.shape.assert_is_compatible_with(y_true.shape)

ValueError: Shapes (None, 2) and (None, 1) are incompatible```
when trying to use more than only 1 metric in my model.compile.. why would that be?

when I only use metrics=['acc'], it works..

grave frost Mar 22, 2021, 2:02 PM

#

I think that you have particularly strong opinions on some topics that may not necessarily align with reality or with the needs of someone else. For example, Kaggle's mini-courses do teach important aspects for beginners. while it may not be needed for you, in practice most beginners like to start with something small.

https://towardsdatascience.com/kaggles-micro-courses-my-favorite-introduction-to-data-science-f0cc6aeb024c Here, the author lists that Kaggle's Mini-courses starts with:-

Data visualization
Pandas
Basic DL which covers Transfer Learning and Data augmentation
Intro and Advanced SQL
GeoSPatial analysis with GeoPandas
Basic NLP
Intro to RL that covers an agent using simple minimax

This differs somewhat from your statement-

In fact, it doesn't teach you ML at all
It gives you a tutorial thing which literally just calls the Decision Tree method without really explaining what it is

Which seems pretty wrong seeing the above evidence.

TBH I really admire your knowledge and apologize if I sound rude or always contradictory (because people in PyDis enjoy arguments a lot) but simply that everyone has an opinion - and if different people provide their perspectives to someone, the person on the other end receives a much better answer of their question overall.

hollow sentinel Mar 22, 2021, 2:11 PM

#

Kaggle is not good enough for beginners

#

I'm sorry dude

#

it's designed to show the highlights of ML while they spoonfeed you code

#

but it'll never give you a strong basis

#

so I agree with Raggy

hollow sentinel Mar 22, 2021, 2:16 PM

#

grave frost I think that you have particularly strong opinions on some topics that may not n...

did you even do the course?

#

bc if you didn't I don't think you have any standing to talk about it

grave frost Mar 22, 2021, 2:22 PM

#

hollow sentinel did you even do the course?

I did do the course when I was new to ML 🙂 if you want to see the amount of stuff they have, this is "intro the DL" where I think the amount of math and code usually levels out with the amount of code required for a beginner https://www.kaggle.com/learn/intro-to-deep-learning

You can inspect more courses here https://www.kaggle.com/learn/overview and decide for yourself. as for me, they helped me when I was a beginner so I stand by my opinion

Learn Intro to Deep Learning Tutorials

Use TensorFlow and Keras to build and train neural networks for structured data.

Learn Python, Data Viz, Pandas & More | Tutorials | Kaggle

Practical data skills you can apply immediately: that's what you'll learn in these free micro-courses. They're the fastest (and most fun) way to become a data scientist or improve your current skills.

tidal bronze Mar 22, 2021, 2:23 PM

#

Which metric could I use to evalute different clusters given that the feature they are based on will differ

hollow sentinel Mar 22, 2021, 2:24 PM

#

grave frost I did do the course when I was new to ML 🙂 if you want to see the amount of stu...

yeah sure you can learn the highlights of DS/ML

#

doesn't mean you actually know what you're doing

grave frost Mar 22, 2021, 2:24 PM

#

@hollow sentinel also, you can execute an exercise where it takes you to a different notebook which tries to explain the concepts learnt with a sample dataset 🤷

hollow sentinel Mar 22, 2021, 2:24 PM

#

especially when they spoonfeed you all the code

#

lmao

#

wow an exercise with a sample dataset

#

with all the code given to you

#

so innovative

#

fill in the blanks

#

lmao

grave frost Mar 22, 2021, 2:25 PM

#

@hollow sentinel I think what your approach is very new to CS, because most people already know the basic coding required to start the courses from the scratch

hollow sentinel Mar 22, 2021, 2:25 PM

#

grave frost <@!567030124306759710> I think what your approach is very new to CS, because mos...

I do know the basic coding to start from scratch

grave frost Mar 22, 2021, 2:25 PM

#

intro to DL is not the first course BTW

hollow sentinel Mar 22, 2021, 2:25 PM

#

I know

#

but it's not enough

grave frost Mar 22, 2021, 2:26 PM

#

its like the 6th or 7th one. before that they teach stuff like visualization and even more basic stuff

hollow sentinel Mar 22, 2021, 2:26 PM

#

micro courses are not enough to build any significant skills

grave frost Mar 22, 2021, 2:26 PM

#

ofc

hollow sentinel Mar 22, 2021, 2:26 PM

#

it's more like dipping your toes in

#

yes

grave frost Mar 22, 2021, 2:26 PM

#

hence the name "Micro-course"

hollow sentinel Mar 22, 2021, 2:26 PM

#

so you just conceded why you're wrong

#

congrats

grave frost Mar 22, 2021, 2:26 PM

#

yeah, but its more than good to give an overview to a beginner

#

(especially when they are not in college)

hollow sentinel Mar 22, 2021, 2:27 PM

#

an overview when they literally know nothing but the basics of python

#

ok

grave frost Mar 22, 2021, 2:27 PM

#

hollow sentinel an overview when they literally know nothing but the basics of python

I wrote this before ^^ people who usually do the courses have been doing CS since school, not learning it first time in college. they can't cater to everyone

hollow sentinel Mar 22, 2021, 2:28 PM

#

it's just a way to cater excitement

#

generate hype

#

it does nothing to teach you

grave frost Mar 22, 2021, 2:28 PM

#

well, then I can't argue with you since you are just fueled on opinion rather than arguments 🙂 have a good day

hollow sentinel Mar 22, 2021, 2:28 PM

#

nice way to concede

tidal bronze Mar 22, 2021, 2:33 PM

#

tidal bronze Which metric could I use to evalute different clusters given that the feature th...

maybe help me instead of pointlessly arguing 😆

kindred radish Mar 22, 2021, 2:47 PM

#

So I've been trying to use K-means and Spectral Clustering to try and detect this cluster over here. I've read that these are good with even clustering sizes, so does this mean it couldn't pick up on the cluster in the circled area?

ripe forge Mar 22, 2021, 2:49 PM

#

Uh. Visually that circle doesn't look like a separate cluster to me

#

Unless you only mean the little group of points off to its own side

kindred radish Mar 22, 2021, 2:49 PM

#

Really?

#

Yeah i do sorry i made the circle too big hahaha

#

ripe forge Mar 22, 2021, 2:49 PM

#

Look at where you've drawn the circle

#

Ah so you do mean the small group then?

kindred radish Mar 22, 2021, 2:50 PM

#

#

I mean this, i just didn't want the circle bit to cover the gap

ripe forge Mar 22, 2021, 2:50 PM

#

Got it. That is definitely better

kindred radish Mar 22, 2021, 2:50 PM

#

So i also know that there is a cluster here

ripe forge Mar 22, 2021, 2:50 PM

#

So, K means needs a k upfront, what output did you get with K set to 4?

kindred radish Mar 22, 2021, 2:51 PM

#

hol' up lemme go check

#

Im using sklearn btw, so do you mean like the number of clusters i told it?

ripe forge Mar 22, 2021, 2:52 PM

#

As for clustering algorithms in general, the idea is they're usually doing their own thing. Usually you only really want to use them for exploration that leads upto something down the line

#

And yes, the number of clusters

kindred radish Mar 22, 2021, 2:53 PM

#

Ill show you what i get for 4 clusters:

kindred radish Mar 22, 2021, 2:54 PM

#

ripe forge And yes, the number of clusters

So i'm just using this to show to my supervisor how machine learning can be used to detect clusters. I'm not trying to get anything analytical from it

#

It looks like the spectral one is handling that cluster a little better, but it's slightly off

forest plover Mar 22, 2021, 2:54 PM

#

Where can I get started on machine learning and ai in general?

kindred radish Mar 22, 2021, 2:55 PM

#

forest plover Where can I get started on machine learning and ai in general?

#data-science-and-ml message is a book i was recommended in this chat ^^

candid sable Mar 22, 2021, 2:55 PM

#

Anyone can help me figure out why I'm getting incompatible shapes ValueError when using multiple metrics in my model.compile? If I only use 'acc', it works..

ripe forge Mar 22, 2021, 2:56 PM

#

I'm curious how dbscan would perform here.

forest plover Mar 22, 2021, 2:56 PM

#

Thank you

kindred radish Mar 22, 2021, 2:56 PM

#

oooooh i read about dbscan! Sklearn says it's good for uneven clusters right? But it said its use-case was for "non-flat geometry"?

misty flint Mar 22, 2021, 3:00 PM

#

@charred egret IRL example of genetic algos https://algorithms-tour.stitchfix.com/#new-style-development

Stitch Fix Algorithms Tour

How data science is woven into the fabric of Stitch Fix.

#

Praise

#

found it from a podcast. very nicely done

#

apparently made with D3

#

pithink

kindred radish Mar 22, 2021, 3:09 PM

#

ripe forge I'm curious how dbscan would perform here.

would you have any insight into setting the eps parameter for DBSCAN? It's not like Kmeans or Spectral where you can predefine how many clusters you're expecting

ripe forge Mar 22, 2021, 3:12 PM

#

So, dbscan is intended for spatial clustering, so yes on that.

#

However that shouldn't stop you from just seeing how it performs, since you can treat each feature as an axis in one dimension

#

For eps, it's simply a param to play around with, I'd say let it do its thing. Higher eps makes fewer clusters iirc

candid sable Mar 22, 2021, 3:14 PM

#

candid sable hi guys - I'm getting ```\venv\lib\site-packages\tensorflow\python\keras\utils\...

anyone?

ripe forge Mar 22, 2021, 3:14 PM

#

Its a measure of distances that are within a tolerance

#

For any two points

kindred radish Mar 22, 2021, 3:14 PM

#

Well i've been playing around with it and I can't really get better than this:

#

#

Doesn't seem to even be able to tell the two big clusters apart a lot of the time

ripe forge Mar 22, 2021, 3:15 PM

#

Ah. Hmm. Guess that's not the move for this dataset then

kindred radish Mar 22, 2021, 3:15 PM

#

RIP

#

I guess spectral is giving me the best of other options ive tried. I went for MeanShift as well

grave frost Mar 22, 2021, 3:20 PM

#

kindred radish Ill show you what i get for 4 clusters:

what's wrong with this results BTW?

kindred radish Mar 22, 2021, 3:22 PM

#

The purple overlaps into the red, where the true cluster is just the tiny LHS island of the purple (for the spectral result)

#

#

Should look like that

grave frost Mar 22, 2021, 3:24 PM

#

your clusters are too less distinct to be identified by k-means.

#

A simple google yields me this paper that deals with clusters with high overlap http://ceur-ws.org/Vol-1455/paper-06.pdf their recommendation is to use some EM algorithm using another CBOvalue score to aid it (and they claim it works better than spectral)

kindred radish Mar 22, 2021, 3:28 PM

#

ah thank you for that! I'll go check it out ^^

tidal bronze Mar 22, 2021, 3:31 PM

#

Which metric could I use to evalute different clusters given that the feature they are based on will differ but all are performed using kmeans

uncut orbit Mar 22, 2021, 3:59 PM

#

what are some good hyperparams for keras neural net?

pseudo wing Mar 22, 2021, 4:02 PM

#

Why is my SVC f1-score on training data in the first code and 2nd code different?

lapis sequoia Mar 22, 2021, 4:32 PM

#

Hey,

#

Not sure if this is the right place but im having some issues with mat plot lib

#

#

this is my graph

#

it does not show the actuall data

#

and the line should have a smooth increase

#

only like +100 per hour

#

but as you can see it bugs significantly

#

also, is there a way to make the graph fit to size?

#

Its a bit wide atm

#

import tkinter as tk
import matplotlib.pyplot as plt
from pandas import DataFrame
from matplotlib.backends.backend_tkagg import FigureCanvasTkAgg
import random
import datetime

data = {'Price': [1],
        'Years': []
        }
time = datetime.datetime.now()
previous_number = 1
for x in range(1000):
    num = random.randint(1, 100)
    previous_number += num
    data["Price"].append(previous_number)
    data["Years"].append(time.strftime("%d/%m/%Y %H:%M:%S"))
    time += datetime.timedelta(hours=1)
data["Years"].append(time.strftime("%d/%m/%Y %H:%M:%S"))


df = DataFrame(data, columns = ['Price','Years'])
for x in df.values:
    print(x)

root = tk.Tk()
figure = plt.Figure(figsize=(1000,4), dpi=100)
ax = figure.add_subplot(111)
chart_type = FigureCanvasTkAgg(figure, root)
chart_type.get_tk_widget().pack()
df = df[['Price','Years']].groupby('Years').sum()
df=df.astype(float)
df.plot(kind='line', legend=True, ax=ax)
ax.set_title('Example')

#

this is my code atm

misty flint Mar 22, 2021, 4:40 PM

#

have you double-checked to see if your data is sorted

lapis sequoia Mar 22, 2021, 4:46 PM

#

misty flint have you double-checked to see if your data is sorted

yea

#

        'Years': []
        }
time = datetime.datetime.now()
previous_number = 1
for x in range(1000):
    num = random.randint(1, 100)
    previous_number += num
    data["Price"].append(previous_number)
    data["Years"].append(time.strftime("%d/%m/%Y %H:%M:%S"))
    time += datetime.timedelta(hours=1)
data["Years"].append(time.strftime("%d/%m/%Y %H:%M:%S"))```

#

is how its generated

#

it should always be going up

#

but 100 max

tidal bronze Mar 22, 2021, 4:55 PM

#

Which metric could I use to evalute different clusters given that the feature they are based on will differ but all are performed using kmeans

misty flint Mar 22, 2021, 4:59 PM

#

"should" but is it?

#

look at your price values once more

#

the actual data values

lapis sequoia Mar 22, 2021, 5:12 PM

#

@misty flint

#

i got the aspect ratio to fit

#

but

#

1s

#

#

https://pastebin.com/2s1kqWzn

Pastebin

[1 '22/03/2021 17:13:53'][97 '22/03/2021 18:13:53'][173 '22/03/2021...

Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.

#

how can i download with python images from google image search?

#

and @lapis sequoia just ran

#

before = 0
total = 0
for line in string.splitlines():
    num = int(line.split("[")[1].split(" ")[0])
    if num <= before:
        total += 1
        print(before)
        print(num)
    before = num```

#

basic checker to see if there is any anomalies in the data set but it never printed anything

#

its the graph

#

#

its only when showing it in tkinter

tidal bronze Mar 22, 2021, 5:47 PM

#

Which metric could I use to evalute different clusters given that the feature they are based on will differ but all are performed using kmeans

grave frost Mar 22, 2021, 5:52 PM

#

could you elaborate on your problem?

limber vector Mar 22, 2021, 6:38 PM

#

Can anyone suggest some site from which I can get a stream of data / free API to be fed into pypspark

left arch Mar 22, 2021, 6:42 PM

#

Hello everyone, sorry if this is the wrong channel but I figured this was a basic data science question. I am trying to have a pandas column that will get the change percentage for stock data between yesterdays close and today's open. I cannot seem to find the correct verbage to google to find this very simple task.. attached is the excel relation I am trying to correspond into my python script. Anything helps, thank you!

#

grave frost Mar 22, 2021, 7:19 PM

#

I read an account where a person trained a word2vec model on their own dataset and then used those vectors to train the model. that seems strange - can we expect a reasonable boost in accuracy on an embedding model trained on little data and get it to capture the contextual vector for each word? it doesn't seem right to me, but maybe you guys can illumintate this issue

candid sable Mar 22, 2021, 8:53 PM

#

I have a retrained InceptionV3 model on Keras and I'm getting 100% acc and val_acc from first to last epoch.. and ofc it's not accurate when I predict.

what could be wrong?

solid quest Mar 22, 2021, 9:04 PM

#

Anyone knows any good source of data for big datasets? I need atleast 4 Gb of data for a project where I need to apply clustering trough apache Spark

grave frost Mar 22, 2021, 9:37 PM

#

How brain handles all the noise and still is able to maintain accurate perception: (non-scientific articles) https://neurosciencenews.com/perception-neuron-misfire-16129/

Neuroscience News

Misfiring from jittery neurons sets fundamental limit on perception...

Study provides new evidence supporting the theory that perceptual limitations are caused by a correlated noise in neural activity.

misty flint Mar 22, 2021, 9:58 PM

#

forecasting tool https://facebook.github.io/prophet/

Prophet

Prophet is a forecasting procedure implemented in R and Python. It is fast and provides completely automated forecasts that can be tuned by hand by data scientists and analysts.

misty flint Mar 22, 2021, 10:18 PM

#

take a sneak peek. the visuals are mind-blowing

#

🧠

#

made with d3

#

DoggoKek

#

they had a full stack data scientist doing it

#

ID_BoomKek

quasi sparrow Mar 22, 2021, 10:31 PM

#

lapis sequoia how can i download with python images from google image search?

I'm pretty sure there is an image scraper somewhere in the web.

#

Do you guys prefer panda over conventional NOSQL? I'm trying to figure out if I should keep learning SQL as a DevOps

#

I mean, I'm not a DevOps yet but I'm working towards it, lol

shell summit Mar 22, 2021, 10:45 PM

#

So I developed a rudimentary chess AI, but it’s slow as hell. Any way I can speed it up?

grave frost Mar 22, 2021, 10:47 PM

#

thats becuase brute-forcing takes time

tidal bough Mar 22, 2021, 10:48 PM

#

if you're searching the state-space in Python, it's just going to be slow in general

#

pretty much just because it's the kind of thing Python is slow in - iteration and number-crunching,

lean ledge Mar 22, 2021, 10:50 PM

#

python's fine at number crunching using numpy, just iteration and general business logic is slow

tidal bough Mar 22, 2021, 10:50 PM

#

In general, though:

Profile your program. (Every single optimization must start with that step)
See what the most expensive functions are, and consider if you can't speed them up, or even rewrite them in something like Cython.

tidal bough Mar 22, 2021, 10:51 PM

#

lean ledge python's fine at number crunching using numpy, just iteration and general busine...

yeah, pretty much. The general idea is that the heavy stuff should be in a faster language - numpy is an example of a library doing that under the hood.

#

but if there isn't a library implementing what you want, your only choice is to learn how to rewrite functions in numba/cython/whatever yourself.

grave frost Mar 22, 2021, 10:55 PM

#

A guy did try to brute-force in chess and implemented some sophisticated techniques to reduce time. it still wasn't enough (he used C++)

#

so I doubt python contributes much to it

#

The problem is that chess just has too many combinations. not as much as GO thankfully, but its still pretty significant

#

so the best way is to train an AI

lean ledge Mar 22, 2021, 10:58 PM

#

grave frost so the best way is to train an AI

lmao no

tidal bough Mar 22, 2021, 10:58 PM

#

I mean, it's true that search is just slow, but also it'd probably be like a hundred times faster in C++ than in Python 😅

lean ledge Mar 22, 2021, 10:58 PM

#

more like 200

tidal bough Mar 22, 2021, 10:58 PM

#

also, aren't this how classical (non-ML) chess engines work? They can be very advanced.

lean ledge Mar 22, 2021, 10:59 PM

#

search engines are sorta different, but basically every game AI you encountered in any game up until the last couple years had no learning

grave frost Mar 22, 2021, 10:59 PM

#

I agree with all your points, but just that with so many combinations there is no reasonable way to speed it up. minimum, it takes 20-30 minutes for each move

grave frost Mar 22, 2021, 11:00 PM

#

lean ledge search engines are sorta different, but basically every game AI you encountered ...

oh, you mean the really basic ones?

lean ledge Mar 22, 2021, 11:00 PM

#

...but we've had much faster chess AIs for like

#

decades

grave frost Mar 22, 2021, 11:00 PM

#

I thought we were trying to get te best possible player 🥴

#

my bad

lean ledge Mar 22, 2021, 11:00 PM

#

even state of the art ML doesn't give the best player possible, that's not reasonably searchable as a space

grave frost Mar 22, 2021, 11:01 PM

#

lean ledge even state of the art ML doesn't give the best player possible, that's not reaso...

AlphaGo did use that in Go

lean ledge Mar 22, 2021, 11:01 PM

#

nope, it's still not the optimal player, just a really really good one

tidal bough Mar 22, 2021, 11:01 PM

#

grave frost I agree with all your points, but just that with so many combinations there is n...

I mean... chess engines exist.

grave frost Mar 22, 2021, 11:01 PM

#

and it beat the world champion twice, so its not much up to debate

lean ledge Mar 22, 2021, 11:01 PM

#

also chess engines were beating grand champions decades ago

grave frost Mar 22, 2021, 11:01 PM

#

lean ledge also chess engines were beating grand champions decades ago

Go

lean ledge Mar 22, 2021, 11:01 PM

#

it's not hard to write a chess engine that is better than all humans

grave frost Mar 22, 2021, 11:01 PM

#

not chess

lean ledge Mar 22, 2021, 11:02 PM

#

we're talking about chess right?

#

how is go relevant

grave frost Mar 22, 2021, 11:02 PM

#

but chess was beater first time by IBM

#

using ML

deft ruin Mar 22, 2021, 11:02 PM

#

Chess engines definitely don’t search the whole state space, but they have efficient methods for pruning nodes with bad moves and taking into account transposition

tidal bough Mar 22, 2021, 11:02 PM

#

there's stuff like this which is completely ML-less and still human-level
https://en.wikipedia.org/wiki/Stockfish_(chess)

Stockfish can use up to 512 CPU threads in multiprocessor systems. The maximal size of its transposition table is 32 TB. Stockfish implements an advanced alpha–beta search and uses bitboards. Compared to other engines, it is characterized by its great search depth, due in part to more aggressive pruning, and late move reductions.[4] As of November 2020, Stockfish 12 (4-threaded) achieves an Elo rating of 3516+24
−20 on the CCRL 40/15 benchmark.[5]
though it does get murdered by ML chess players:
In December 2017, Stockfish 8 was used as a benchmark to test Google division Deepmind's AlphaZero, with each engine supported by different hardware. AlphaZero was trained through self-play for a total of nine hours, and reached Stockfish's level after just four.[48][49][50] In 100 games from the normal starting position, AlphaZero won 25 games as White, won 3 as Black, and drew the remaining 72, with 0 losses.[51] AlphaZero also played twelve 100-game matches against Stockfish starting from twelve popular openings for a final score of 290 wins, 886 draws and 24 losses, for a point score of 733:467.[52][note 1]

Stockfish (chess)

Stockfish is a free and open-source chess engine, available for various desktop and mobile platforms. It is developed by Marco Costalba, Joona Kiiski, Gary Linscott, Tord Romstad, Stéphane Nicolet, Stefan Geschwentner, and Joost VandeVondele, with many contributions from a community of open-source developers.Stockfish is consistently ranked firs...

grave frost Mar 22, 2021, 11:02 PM

#

On this day 21 years ago, the world changed forever when a computer beat the then-chess champion of the world at his own game. On February 10, 1996, Deep Blue beat Garry Kasparov in the first game of a six-game match—the first time a computer had ever beat a human in a formal chess game.1

#

yeah, the best chess player is always an AI hands down

lean ledge Mar 22, 2021, 11:04 PM

#

mostly because the "AI" is just a normal chess engine being sped up with a good heuristic

#

so it's more of a normal engine++

tidal bough Mar 22, 2021, 11:04 PM

#

questionable. Are there any model-free-learning-based chess AIs?

lean ledge Mar 22, 2021, 11:05 PM

#

there might be but they won't beat stockfish :p

grave frost Mar 22, 2021, 11:05 PM

#

tidal bough questionable. Are there any model-free-learning-based chess AIs?

how are you even supposed to google that? "computer beat human in chess with no ai"

lean ledge Mar 22, 2021, 11:05 PM

#

model-free RL chess?

grave frost Mar 22, 2021, 11:05 PM

#

lean ledge model-free RL chess?

thats not a good google search term

#

try it tho, it might be 😉 I just get 3D models lol

lean ledge Mar 22, 2021, 11:06 PM

#

got it lol

#

https://deepmind.com/blog/article/muzero-mastering-go-chess-shogi-and-atari-without-rules

Deepmind

MuZero: Mastering Go, chess, shogi and Atari without rules

In 2016, we introduced AlphaGo, the first artificial intelligence (AI) program to defeat humans at the ancient game of Go. Two years later, its successor - AlphaZero - learned from scratch to master Go, chess and shogi. Now, in a paper in the journal Nature, we describe MuZero, a significant step forward in the pursuit of general-purpose algorit...

grave frost Mar 22, 2021, 11:07 PM

#

its a model....?

#

MuZero just models aspects that are important to the agent’s decision-making process. After all, knowing an umbrella will keep you dry ....

tidal bough Mar 22, 2021, 11:08 PM

#

wtf

lean ledge Mar 22, 2021, 11:08 PM

#

I dont think you know what you're talking about enough to actually have a conversation about this. It's not given a model of the game which includes the rules of chess

tidal bough Mar 22, 2021, 11:08 PM

#

why are they calling it model-free then

tidal bough Mar 22, 2021, 11:09 PM

#

lean ledge I dont think you know what you're talking about enough to actually have a conver...

wait, what do they mean by "aspects that are important to the agent’s decision-making process" then?

grave frost Mar 22, 2021, 11:09 PM

#

and why do they use reward, value and policy then? sounds kinda like RL to me

lean ledge Mar 22, 2021, 11:09 PM

#

MuZero learns a model that, when applied iteratively,
predicts the quantities most directly relevant to planning: the reward, the action-selection policy, and
the value function.

lean ledge Mar 22, 2021, 11:10 PM

#

grave frost and why do they use reward, value and policy then? sounds kinda like RL to me

Yes, it's model-free reinforcement learning

tidal bough Mar 22, 2021, 11:10 PM

#

Specifically, MuZero models three elements of the environment that are critical to planning:
The value: how good is the current position?
The policy: which action is the best to take?
The reward: how good was the last action?

oh, that's just what all RL (or at least everything derived from q-learning) does. Not sure why are they calling it modelling, tbh.

grave frost Mar 22, 2021, 11:10 PM

#

https://paperswithcode.com/method/muzero

MuZero is a model-based reinforcement learning algorithm.
LOLOL

Papers with Code - MuZero Explained

MuZero is a model-based reinforcement learning algorithm. It builds upon AlphaZero's search and search-based policy iteration algorithms, but incorporates a learned model into the training procedure.

The main idea of the algorithm is to predict those aspects of the future that are directly relevant for planning. The model receives the observat...

lean ledge Mar 22, 2021, 11:11 PM

#

tidal bough > Specifically, MuZero models three elements of the environment that are critica...

It's learning the game dynamics without being given the rules

lean ledge Mar 22, 2021, 11:11 PM

#

grave frost https://paperswithcode.com/method/muzero > MuZero is a model-based reinforcement...

Why are you going lololol

#

This is exactly what I've been saying all the time

#

You haven't "owned" anyone, you just dont understand the conversation we're having

grave frost Mar 22, 2021, 11:11 PM

#

model-free reinforcement learning
^ that's what you said
MuZero is a model-based reinforcement learning
^ paper

lean ledge Mar 22, 2021, 11:12 PM

#

Sorry, I should have been more precise, it has a model of the graph structure, it doesn't have the dynamics of the game

grave frost Mar 22, 2021, 11:13 PM

#

that kinda seems like RL, but you can help me understand the difference

The model receives the observation (e.g. an image of the Go board or the Atari screen) as an input and transforms it into a hidden state. The hidden state is then updated iteratively by a recurrent process that receives the previous hidden state and a hypothetical next action. At every one of these steps the model predicts the policy (e.g. the move to play), value function (e.g. the predicted winner), and immediate reward (e.g. the points scored by playing a move). The model is trained end-to-end, with the sole objective of accurately estimating these three important quantities, so as to match the improved estimates of policy and value generated by search as well as the observed reward.

tidal bough Mar 22, 2021, 11:14 PM

#

lean ledge mostly because the "AI" is just a normal chess engine being sped up with a good ...

anyway, so MuZero is model-free and beats chess engines and even older AIs. So sure, it's easier to just plug an ML heuristic to a search engine, but it surely isn't the only way things can go.

grave frost Mar 22, 2021, 11:15 PM

#

so...the only difference is there in this part

the hidden states are free to represent state in whatever way is relevant to predicting current and future values and policies. Intuitively, the agent can invent, internally, the rules or dynamics that lead to most accurate planning.
like you mean it does not model the game or its rules

lean ledge Mar 22, 2021, 11:16 PM

#

It's not completely model-free, it just doesn't have the dynamics

lean ledge Mar 22, 2021, 11:16 PM

#

grave frost so...the only difference is there in this part > the hidden states are free to r...

yes, it doesn't have a dynamic model of the game which alphazero and other chess engines usually have

tidal bough Mar 22, 2021, 11:16 PM

#

I should probably try reading that paper, lol

lean ledge Mar 22, 2021, 11:17 PM

#

grave frost Mar 22, 2021, 11:17 PM

#

lean ledge yes, it doesn't have a dynamic model of the game which alphazero and other chess...

but if alphazero can generalize over multiple games (chess, go shogi) how does not having a dynamic model make a difference as long as it can generalize?

lean ledge Mar 22, 2021, 11:18 PM

#

alphazero does have a dynamic model lol, you have to program it in

tidal bough Mar 22, 2021, 11:19 PM

#

that's just how nearly all RL works. If I understand it right, the idea is that the transitions between current and possible future states depending on the action are learned by the model (instead of programmed in)... and also not stored at all, I think, just used on each value update?

lean ledge Mar 22, 2021, 11:20 PM

#

RL can work entirely on the observables (image, etc) of program (model free), or it can have an idea of the dynamics and/or the decision making structure (model)

#

This one is partially model free in not having the dynamics but modelled in that it's not working on the direct observables but a parsed structure with decision information

grave frost Mar 22, 2021, 11:21 PM

#

im not much familiar with RL, but wasn't the whole point to learn an environment without any prior hard coding (contrary to the dynamic model that we have to program in it)

lean ledge Mar 22, 2021, 11:21 PM

#

Not of AlphaZero

#

MuZero learns dynamics, sort of, it learns the dynamics of the optimal control based values

iron basalt Mar 22, 2021, 11:22 PM

#

grave frost im not much familiar with RL, but wasn't the whole point to learn an environment...

Not really, for example, does a human learn without any prior "hard coding"? No, genetics are a thing.

grave frost Mar 22, 2021, 11:22 PM

#

and AZ does not; it has to be hard coded. that seems like a workaround

grave frost Mar 22, 2021, 11:22 PM

#

iron basalt Not really, for example, does a human learn without any prior "hard coding"? No,...

true

tidal bough Mar 22, 2021, 11:22 PM

#

lean ledge Mar 22, 2021, 11:23 PM

#

But MuZero is still fundamentally based on a graph search, it's still doing Monte Carlo Tree Search

#

It's just learning some dynamics alongisde the fundamental decision making MDP model

grave frost Mar 22, 2021, 11:23 PM

#

but conventional consumer level stuff can usually solve simple environments without any hard coding, so I assumed that other techniques just scale up the complexity 🤷 sad

lean ledge Mar 22, 2021, 11:24 PM

#

grave frost but conventional consumer level stuff can usually solve simple environments with...

not really? RL tends to only solve without hardcoding for simpler mostly low dimensional problems

iron basalt Mar 22, 2021, 11:24 PM

#

The chess "AI"'s are kind of fuzzy when it comes to model vs model-free. Of course, one does not need to model everything, only parts of some task could be modeled.

lean ledge Mar 22, 2021, 11:24 PM

#

RL is very very behind what people want it to think

grave frost Mar 22, 2021, 11:25 PM

#

lean ledge not really? RL tends to only solve without hardcoding for simpler mostly low dim...

like that survival simulation one of OpenAi; that's kinda complex but it doesn't require hard-coding (at least according to the narrator)

lean ledge Mar 22, 2021, 11:25 PM

#

It's productive to read some summarisation of how ridiculously bad RL can be, even if someone with a few million to spare has been able to "solve" Go
https://arxiv.org/pdf/1806.09460.pdf
https://www.alexirpan.com/2018/02/14/rl-hard.html

Deep Reinforcement Learning Doesn't Work Yet

June 24, 2018 note: If you want to cite an example from the post, please
cite the paper which that example came from. If you want to cite the
post as a whole, you can use the following BibTeX:

grave frost Mar 22, 2021, 11:39 PM

#

Just a pretty naive way to solve the possible RL problem: while an agent is randomly searching its space, why can't we inject a pseudo-randomly combination that represents to a great degree the task we want it close to do and have the model just optimize it for maximum reward (a well defined and thought out reward function).

Wouldn't this allow it to learn complex task if we give it a boost in the start (like a nudge to the correct direction) so that it can easily make the connection on the best way to accomplish a complex task?

tidal bough Mar 22, 2021, 11:42 PM

#

it seems to me like you just described what all RL agents that get taught on human records do

#

like, AlphaZero is Zero because it only got taught on its own games - it wasn't primed by learning on tons of human matches like AlphaGo. As a result, AlphaZero took a lot longer to learn, but ended up better at the end.

grave frost Mar 22, 2021, 11:45 PM

#

umm, I may be misunderstanding you here, but what I meant is just to provide a skeleton of the possible action that the RL algo should take to help it accomplish pretty complex tasks and get a general idea of how it is supposed to solve a particular environment.

austere swift Mar 23, 2021, 2:46 AM

#

does anybody here actually use cupy?

#

i've tried using it once and it had all sorts of errors

#

and it seems nice but i've never found it super helpful anyways

misty flint Mar 23, 2021, 2:47 AM

#

@stiff barn @rapid fog yo yo

stiff barn Mar 23, 2021, 2:47 AM

#

ayy

misty flint Mar 23, 2021, 2:47 AM

#

google analytics ive heard is a real nice tool

#

for tracking

#

does google have AutoML or is that a dif cloud provider?

rapid fog Mar 23, 2021, 2:47 AM

#

What's the easiest way to host postgres on the cloud for a discord bot?

#

Just on the VPS?

stiff barn Mar 23, 2021, 2:48 AM

#

Google Cloud has hosted postgres

misty flint Mar 23, 2021, 2:48 AM

#

ValkNaruhodo

rapid fog Mar 23, 2021, 2:48 AM

#

Is it free?

stiff barn Mar 23, 2021, 2:48 AM

#

Can also use digital ocean

misty flint Mar 23, 2021, 2:48 AM

#

i see

#

oh digital ocean

stiff barn Mar 23, 2021, 2:48 AM

#

Yeah, GCP has auto ml. Most of the clouds do

misty flint Mar 23, 2021, 2:48 AM

#

that one seems interesting

#

interesting

stiff barn Mar 23, 2021, 2:48 AM

#

You get plenty of free credits with both @rapid fog

#

Digital Ocean is my go to for smaller projects. Has easy to launch vps, managed databases, kubernetes, ect...

misty flint Mar 23, 2021, 2:49 AM

#

ValkNaruhodo

rapid fog Mar 23, 2021, 2:49 AM

#

I'll take a look. Thank you!

stiff barn Mar 23, 2021, 2:50 AM

#

No problem

misty flint Mar 23, 2021, 2:50 AM

#

i need to become more familiar with the cloud

#

maybe this summer

#

when working with AWS

#

DoggoKek

stiff barn Mar 23, 2021, 2:50 AM

#

AWS is still the most popular so a good place to start

misty flint Mar 23, 2021, 2:51 AM

#

i always feel like in this field there is always an endless amount of things to learn

#

CryLaugh

#

if you want to stay relevant

stiff barn Mar 23, 2021, 2:52 AM

#

Haha yeah you can never learn it all.

misty flint Mar 23, 2021, 2:52 AM

#

do you do any testing/unit-testing

#

people have said i should also learn that

#

this never-ending bucket list... ID_BoomKek

stiff barn Mar 23, 2021, 2:53 AM

#

Yeah, I write test cases for every function/method I build.

misty flint Mar 23, 2021, 2:53 AM

#

that sounds like good swe practice

stiff barn Mar 23, 2021, 2:54 AM

#

It becomes natural very quickly. Writing tests in Python is pretty intuitive.

misty flint Mar 23, 2021, 2:54 AM

#

do you use..what is it called

#

pytest

#

pithink

stiff barn Mar 23, 2021, 2:54 AM

#

just the native unittest library

misty flint Mar 23, 2021, 2:54 AM

#

ah

#

ValkNaruhodo

#

i see

stiff barn Mar 23, 2021, 2:55 AM

#

Next time you go to test some python code manually just try to write a test instead and you might find that it actually makes your life easier.

misty flint Mar 23, 2021, 2:55 AM

#

hmm

#

i need to remember this

#

ValkNaruhodo

#

anyway

#

how goes your ML studies

#

BongoCat

stiff barn Mar 23, 2021, 2:56 AM

#

Going well. Working on the finishing touches of a recommendation engine I've been building for a while.

#

Had to build too many pieces for it haha

lean ledge Mar 23, 2021, 2:58 AM

#

Should go with Azure AutoML for totally unbiased reasons

misty flint Mar 23, 2021, 2:59 AM

#

noice

#

Praise

#

glad its coming together for you

lean ledge Mar 23, 2021, 2:59 AM

#

stiff barn Going well. Working on the finishing touches of a recommendation engine I've bee...

What type?

stiff barn Mar 23, 2021, 3:02 AM

#

lean ledge What type?

Somewhat nontraditional. I built a binary classification multi-modal model that takes the apartment images and processes those via a CNN, then structured data as a DNN, then concatenates them. I'm recommending apartments to just myself.

misty flint Mar 23, 2021, 3:02 AM

#

hey, its a nice use case

#

DoggoKek

#

one guy on a podcast i heard built a neural net just for tinder swipes

#

for himself

lean ledge Mar 23, 2021, 3:03 AM

#

Ah so it's supervised content based recommendation

#

I was expecting collaborative filtering

stiff barn Mar 23, 2021, 3:04 AM

#

lean ledge I was expecting collaborative filtering

Yeah, there is just 1 user so hard to use traditional techniques. Plus I wanted to experiment with multiple input type models.

stiff barn Mar 23, 2021, 3:05 AM

#

misty flint one guy on a podcast i heard built a neural net just for tinder swipes

Sounds efficient haha

#

How about yours @misty flint?

misty flint Mar 23, 2021, 3:07 AM

#

wait let me see if i can find a link

wide sorrel Mar 23, 2021, 3:08 AM

#

hello

#

how would you recommend begging to learn about ai?

misty flint Mar 23, 2021, 3:11 AM

#

@stiff barn https://towardsdatascience.com/m2m-day-89-how-i-used-artificial-intelligence-to-automate-tinder-ced91b947e53

Medium

M2M Day 90— How I used Artificial Intelligence to automate Tinder

This post is a part of Jeff’s 12-month, accelerated learning project called “Month to Master.” For March, he is downloading the ability to…

#

💀

wide sorrel Mar 23, 2021, 3:11 AM

#

beginning

misty flint Mar 23, 2021, 3:12 AM

#

if you come from a non-technical background, you can start with andrew ng's AI for Everybody course on coursera

#

its a good start

stiff barn Mar 23, 2021, 3:12 AM

#

Seems to be the one people gravitate to. Must be good

misty flint Mar 23, 2021, 3:12 AM

#

stiff barn How about yours <@!446424248479645706>?

my learning? im still working on group projects haha

#

we barely finished a 2-3 week long one where we made a contract analysis app with some basic nlp

misty flint Mar 23, 2021, 3:13 AM

#

stiff barn Seems to be the one people gravitate to. Must be good

its good for non-technical people imo. i like andrew's business perspective on AI

#

he goes through what kind of business projects AI/ML is good at vs. those that arent good projects

#

and then walks through how to try to build up a AI/data culture at your company if youre trying to create buy-in/not everyone is onboard with change

#

lol

exotic maple Mar 23, 2021, 3:15 AM

#

misty flint <@!247847269267800074> https://towardsdatascience.com/m2m-day-89-how-i-used-arti...

this is the kind of project i need

stiff barn Mar 23, 2021, 3:16 AM

#

Pretty interesting and quick read

wide sorrel Mar 23, 2021, 3:16 AM

#

im looking for a course thats more hands-on, being that im already somewhat fluent in python

stiff barn Mar 23, 2021, 3:17 AM

#

misty flint he goes through what kind of business projects AI/ML is good at vs. those that a...

Sounds pretty useful. That's something a lot of people would skip

misty flint Mar 23, 2021, 3:19 AM

#

yeah but i wouldnt really recommend it to most technical peeps unless theyre interested in going into management or part of a large team

#

at the very least, you can 2x through the videos and get through the gist of it pretty quickly

#

DoggoKek

misty flint Mar 23, 2021, 3:20 AM

#

exotic maple this is the kind of project i need

i shall patiently await for your results

#

then you can report back to the class

#

DoggoKek

exotic maple Mar 23, 2021, 3:20 AM

#

The problem is, i have no training data

#

1 sample from my side :v

#

few dimensions

#

F

misty flint Mar 23, 2021, 3:21 AM

#

looks like youll have to do some swiping

#

to feed the model

stiff barn Mar 23, 2021, 3:22 AM

#

Haha yup, gotta build it up

misty flint Mar 23, 2021, 3:22 AM

#

DoggoKek

stiff barn Mar 23, 2021, 3:22 AM

#

That's the fun part

misty flint Mar 23, 2021, 3:22 AM

#

honestly that sounds like a hilarious project to have on your resume

#

def a talking point

stiff barn Mar 23, 2021, 3:23 AM

#

For sure

#

I'm surprised that dude labeled 10,000. That's a lot....

misty flint Mar 23, 2021, 3:23 AM

#

thats pretty wild yeah but hes done other crazy things before

stiff barn Mar 23, 2021, 3:23 AM

#

I built a client and labeled like 2000 apartments for my project and that took a long long time

misty flint Mar 23, 2021, 3:24 AM

#

oh yeah he did this challenge called 12 months to mastery which is honestly pretty ridic

#

https://medium.com/@dj.jeffmli/the-learning-quest-mastering-12-skills-in-12-months-125218780fde

Medium

Month 2 Master: Mastering 12 skills in 12 months

If you’re interested in learning more about me, check out my website.

#

@stiff barn March was his Tinder bot month

#

💀

stiff barn Mar 23, 2021, 3:25 AM

#

Very interesting haha

#

Can see a lot of room for improvement but super cool for a fast project

exotic maple Mar 23, 2021, 3:26 AM

#

i really need to learn CNN

#

and OpenCV

#

the whole analyze images makes for some very damn good portfolio projects

misty flint Mar 23, 2021, 3:26 AM

#

did opencv for one project but nowhere near mastery

#

DoggoKek

#

pillow is a cool library too

#

using that for another project

stiff barn Mar 23, 2021, 3:27 AM

#

The nice thing in that area is that is where the bulk of the pre-trained models for transfer learning are so you can get good results quickly.

#

I like pillow. Keeps things simple

exotic maple Mar 23, 2021, 3:27 AM

#

I think i want to make a repo with 3 projects. One purely "analytics" and visualization
One with a ML model (not sure of how to show it here)
One with an image NN (havent even started NNs lol)

misty flint Mar 23, 2021, 3:27 AM

#

yeah i feel like projects are now either CV or NLP focused

#

at least the "interesting" ones

exotic maple Mar 23, 2021, 3:28 AM

#

misty flint yeah i feel like projects are now either CV or NLP focused

bro i just started checing NLP out and i hate it

#

though, I actually feel like doign a sentiment analysis miniproject in spanish (my mother language)

misty flint Mar 23, 2021, 3:28 AM

#

well i heard you usually end up choosing one or the other

stiff barn Mar 23, 2021, 3:28 AM

#

NLP is weird because you just know GPT-3 is there and you'll never get anywhere near it

misty flint Mar 23, 2021, 3:28 AM

#

so its okay

#

DoggoKek

exotic maple Mar 23, 2021, 3:28 AM

#

but id need to scrape some language

#

yeah NLP is a dead end with GPT and transformer models lol

misty flint Mar 23, 2021, 3:28 AM

#

exotic maple though, I actually feel like doign a sentiment analysis miniproject in spanish (...

that would be cool. i think one of my team members is doing sentiment analysis for russian

exotic maple Mar 23, 2021, 3:29 AM

#

truth be told, with how behind most companies are, even a simple KNN implementation would do wonders pydis_snake

misty flint Mar 23, 2021, 3:29 AM

#

stiff barn NLP is weird because you just know GPT-3 is there and you'll never get anywhere ...

yeah but i think there will still be cool business use-cases, no?

stiff barn Mar 23, 2021, 3:29 AM

#

Would be nice if GPT-3 was open so we could use it for transfer learning and such. I have access to the api for it but that's limited to some extent.

misty flint Mar 23, 2021, 3:30 AM

#

maybe it will be open more in the future

exotic maple Mar 23, 2021, 3:30 AM

#

bro, no joke, I have a guy in my BU trying to make a "bot" for classifying cases? His idea, REGEX!!

misty flint Mar 23, 2021, 3:30 AM

#

pithink

stiff barn Mar 23, 2021, 3:30 AM

#

There are still plenty of use cases for NLP.

exotic maple Mar 23, 2021, 3:30 AM

#

motherfucker I can do that better with a simple KNN lmao

stiff barn Mar 23, 2021, 3:30 AM

#

And GPT-3 is spawning many businesses.

#

Probably will be closed forever though since Microsoft bought the exclusive license to it.

misty flint Mar 23, 2021, 3:30 AM

#

sadoru

#

maybe it will be an azure service?

#

blobhyperthink

tidal bough Mar 23, 2021, 3:30 AM

#

exotic maple bro, no joke, I have a guy in my BU trying to make a "bot" for classifying cases...

machine learn your way to the right regexp lemon_pleased

stiff barn Mar 23, 2021, 3:31 AM

#

It will be for sure. And you can get access to the api if you ask nicely and wait a long time.

misty flint Mar 23, 2021, 3:31 AM

#

DoggoKek

#

ValkNaruhodo

exotic maple Mar 23, 2021, 3:31 AM

#

there's literally a startup t o make webapss, based on GPT-3 that only needs a rough description of the app

#

https://debuild.co/

Debuild

Build web apps lightning fast.

misty flint Mar 23, 2021, 3:32 AM

#

ValkNaruhodo

stiff barn Mar 23, 2021, 3:32 AM

#

Yeah, the stuff being built with GPT-3 is very interesting

exotic maple Mar 23, 2021, 3:32 AM

#

it's pretty rough from what i've seen, but c'mon, can you imagine the chaos of getting rid of half of "full stack devs"?

stiff barn Mar 23, 2021, 3:33 AM

#

This was cool as well

#

https://openai.com/blog/dall-e/

OpenAI

DALL·E: Creating Images from Text

We’ve trained a neural network called DALL·E that creates images from text captions for a wide range of concepts expressible in natural language.

exotic maple Mar 23, 2021, 3:33 AM

#

AI will kill us all

#

abandon AI, return to Amish

stiff barn Mar 23, 2021, 3:33 AM

#

Agreed

misty flint Mar 23, 2021, 3:34 AM

#

haha this is great. i like the avocado chair

exotic maple Mar 23, 2021, 3:34 AM

#

REx

#

are you a student?

stiff barn Mar 23, 2021, 3:34 AM

#

Avocado chair is pretty good

misty flint Mar 23, 2021, 3:35 AM

#

hmm?

#

yes..?

#

NervousSip

exotic maple Mar 23, 2021, 3:35 AM

#

Lol

#

i'll send you a DM

misty flint Mar 23, 2021, 3:35 AM

#

NervousSip

#

ok

exotic maple Mar 23, 2021, 3:35 AM

#

oof

#

privated

#

just like in tinder

#

-cries-

misty flint Mar 23, 2021, 3:36 AM

#

Oopsies

arctic wedgeBOT Mar 23, 2021, 4:23 AM

#

Hey @tardy plover!

It looks like you tried to attach file type(s) that we do not allow (.pdf). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a.

Feel free to ask in #community-meta if you think this is a mistake.

frigid forum Mar 23, 2021, 6:59 AM

#

pydis_strong

clever karma Mar 23, 2021, 8:32 AM

#

How much linear algebra do people in machine learning actually use?

ivory pendant Mar 23, 2021, 8:54 AM

#

35% in pinned messages

gentle bramble Mar 23, 2021, 10:39 AM

#

hi guys

#

so i'm an intermediate in python

#

and i wrote a tutorial on how to make a face recognition program with python

#

if anyone has any spare time i'd love if you checked it out and tell me if i have any mistakes

#

https://aliboughazi221b.medium.com/how-to-write-a-face-recognition-program-in-python-45cb5c031741

Medium

How to write a face recognition program in python?

Maybe you’ve been watching a lot of techy tv shows,maybe you’re learning how to code and need a project to practice on, or maybe you’re…

frigid forum Mar 23, 2021, 10:54 AM

#

gentle bramble https://aliboughazi221b.medium.com/how-to-write-a-face-recognition-program-in-py...

nice one! intermediate and already writing tutorials? very cool

tidal bronze Mar 23, 2021, 10:59 AM

#

what do you guys think should be my k in this case?

serene scaffold Mar 23, 2021, 12:59 PM

#

exotic maple yeah NLP is a dead end with GPT and transformer models lol

You're saying that because of GPT and transformers, there's nothing else to do in NLP?

#

My coworkers haven't run out of things to do 🤷‍♂️

exotic maple Mar 23, 2021, 1:07 PM

#

I mean, I said it as a kind of tongue-in-cheek kind of joke, but I can see how it definitely didnt come across as it

#

My hate for NLP is also not subtle :p

grave frost Mar 23, 2021, 1:08 PM

#

There is so much to learn NLP that I am dying everyday

#

it's like every paper has some different technique and there's a whole flood of them

serene scaffold Mar 23, 2021, 1:25 PM

#

exotic maple I mean, I said it as a kind of tongue-in-cheek kind of joke, but I can see how i...

Puts down banhammer

serene scaffold Mar 23, 2021, 1:25 PM

#

exotic maple My hate for NLP is also not subtle :p

raises banhammer

exotic maple Mar 23, 2021, 1:25 PM

#

😔

grave frost Mar 23, 2021, 1:26 PM

#

exotic maple My hate for NLP is also not subtle :p

what do you find boring about it? (just curious)

exotic maple Mar 23, 2021, 1:27 PM

#

I said hate, not boring

grave frost Mar 23, 2021, 1:28 PM

#

what do you hate about it then?

raw minnow Mar 23, 2021, 2:26 PM

#

Hiiii, I have a question

#

If I want to start learning data science

#

should i learn jupyter, rstudio, watson studio,... or python's numpy, pandas, matplotlib, seaborn... first?

#

thanks!

#

is machine learning related to data science?

keen kestrel Mar 23, 2021, 2:31 PM

#

emacs

exotic maple Mar 23, 2021, 2:47 PM

#

grave frost what do you hate about it then?

its a kind of irrational hate. Lemmatization (specially of internet l33t speech) is absurdedly annoying

serene scaffold Mar 23, 2021, 2:54 PM

#

raw minnow is machine learning related to data science?

yes, it's a subset of data science. and it's an approach to AI

raw minnow Mar 23, 2021, 2:55 PM

#

@serene scaffold wow, I thought machine learning and deep learning are both AI

keen kestrel Mar 23, 2021, 2:56 PM

#

raw minnow <@!253696366952316929> wow, I thought machine learning and deep learning are bot...

AI is what you say in your resume, deep learning during interview, machine learning during your work

grave frost Mar 23, 2021, 2:57 PM

#

serene scaffold yes, it's a subset of data science. and it's an approach to AI

Data science has an intersection with artificial intelligence but is not a subset of artificial intelligence
its mostly opinionated - I for one agree with the above

raw minnow Mar 23, 2021, 2:58 PM

#

so is an if-else program to determine whether a number is even or odd artificial intelligence?

grave frost Mar 23, 2021, 2:58 PM

#

exotic maple its a kind of irrational hate. Lemmatization (specially of internet l33t speech)...

I don't even know what i33t is

grave frost Mar 23, 2021, 2:59 PM

#

raw minnow so is an if-else program to determine whether a number is even or odd artificial...

yes - extremely basic or naive, it tried to mimic artificial inteliigence keeping in mind the modern interpretation

#

it technically counts as logic, but doesn't actually exhibit intelligence, you can't say its AI

odd lion Mar 23, 2021, 3:00 PM

#

grave frost I don't even know what i33t is

Ouch, this hurts. https://en.wikipedia.org/wiki/Leet

Leet

Leet (or "1337"), also known as eleet or leetspeak, is a system of modified spellings used primarily on the Internet. It often uses character replacements in ways that play on the similarity of their glyphs via reflection or other resemblance. Additionally, it modifies certain words based on a system of suffixes and alternate meanings. There ar...

raw minnow Mar 23, 2021, 3:01 PM

#

i'm currently learning computer science at college as a first year

grave frost Mar 23, 2021, 3:01 PM

#

Again though, most of the definition are up for discussion and opinion 🤷 and you would find plenty of ideas online

raw minnow Mar 23, 2021, 3:01 PM

#

is data science a subset of computer science or it use computer science as a tool?

austere swift Mar 23, 2021, 3:01 PM

#

computer science is a tool for it

#

you can do data science by hand

#

but, who wants to do that 😆

lapis sequoia Mar 23, 2021, 3:02 PM

#

Can any1 help me?

odd lion Mar 23, 2021, 3:02 PM

#

I think most DS studies would fall into CS schools these days because the majority of the work is involving CS. But economists use DS all the time, so does business school, agriculture,etc...

austere swift Mar 23, 2021, 3:02 PM

#

lapis sequoia Can any1 help me?

whats your question?

lapis sequoia Mar 23, 2021, 3:02 PM

#

I have a file that I've done. but I am facing some issues that not running the file

arctic wedgeBOT Mar 23, 2021, 3:02 PM

#

Hey @lapis sequoia!

It looks like you tried to attach a Python file - please use a code-pasting service such as https://paste.pythondiscord.com

austere swift Mar 23, 2021, 3:02 PM

#

can you elaborate

#

!code

arctic wedgeBOT Mar 23, 2021, 3:02 PM

#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

lapis sequoia Mar 23, 2021, 3:03 PM

#

I made a file in python

grave frost Mar 23, 2021, 3:03 PM

#

you mean you wrote code

lapis sequoia Mar 23, 2021, 3:03 PM

#

but there is an error

#

ye exactly

austere swift Mar 23, 2021, 3:03 PM

#

can you show some code or tell us what the error is

lapis sequoia Mar 23, 2021, 3:03 PM

#

and i couldn't solve this error

grave frost Mar 23, 2021, 3:03 PM

#

is it specific to AI / Data-science?

austere swift Mar 23, 2021, 3:03 PM

#

^

#

if its just a general issue then you can claim a help channel #❓｜how-to-get-help

lapis sequoia Mar 23, 2021, 3:04 PM

#

#

idk what this means

austere swift Mar 23, 2021, 3:04 PM

#

it means what it says

#

invalid syntax

#

you wrote the code wrong

#

this doesnt look like it has anything to do with data science though

#

so you should claim a help channel

lapis sequoia Mar 23, 2021, 3:04 PM

#

Okay ik
So what is the right code to

#

how :/

#

i jus joined

#

here

austere swift Mar 23, 2021, 3:05 PM

#

it looks like you already have one lol

lapis sequoia Mar 23, 2021, 3:05 PM

#

o;

misty flint Mar 23, 2021, 3:05 PM

#

memecringeharold

raw minnow Mar 23, 2021, 3:05 PM

#

#

#

#

is this a good road map?

grave frost Mar 23, 2021, 3:06 PM

#

why are you learning jupyter BTW?

raw minnow Mar 23, 2021, 3:06 PM

#

it's included in the course, i didnt specifically pick it

misty flint Mar 23, 2021, 3:06 PM

#

are you learning on mobile?

#

uh oh

#

memecringeharold

grave frost Mar 23, 2021, 3:06 PM

#

and I recommend you leave roadmaps but rather learn basics and learn what you like rather than following some set path

raw minnow Mar 23, 2021, 3:06 PM

#

no, its a screenshot from my laptop

misty flint Mar 23, 2021, 3:07 PM

#

ah its cropped

#

from..it looks like udemy?

raw minnow Mar 23, 2021, 3:07 PM

#

yup

misty flint Mar 23, 2021, 3:07 PM

#

might as well start somewhere. as long you think you can complete it

#

just know its not comprehensive

grave frost Mar 23, 2021, 3:08 PM

#

tbh I find a mindset that "I have to learn x thing by y time" to be the most unproductive one ever. its not how we really learn things

raw minnow Mar 23, 2021, 3:08 PM

#

because there are a course from edx too but it's very different and have things like sql, rstudio, jupyter lab, watson studio,...

serene scaffold Mar 23, 2021, 3:09 PM

#

raw minnow

I discourage Python learners from touching jupyter until they have more experience with the language, as it makes everything more difficult to debug and encourages you to not think about code re-usability.

grave frost Mar 23, 2021, 3:12 PM

#

I don't get why everyone's like "I started coding and aim to build an app in 1 month and then learn AI at the end of 5th month" after that just milk that 120K. That is such a bad mindset. you end up leaving CS in 2 weeks just because you don't like it.

This isn't your school exams that you force yourself and it doesn't matter much if you forget everything (or worse, just rote memorize and forget). CS takes time, years, decades of work to be good at it. roadmaps are a good indicator as to what amount of knowledge an average person is expected to have at a certain point, but its not a path set in stone to follow for eternity.

austere swift Mar 23, 2021, 3:13 PM

#

yeah honestly imo it's better to just think of an application or project you wanna learn how to do, then learn about that specific topic/project to be able to do it. then after you learn one project and like the basics of it, you can adapt your code to do other stuff as well

#

so like if you start off with something like "i want to be able to visualize this dataset", then you learn about how to use the different visualization tools, learn about data management and stuff like pandas/numpy, etc

#

later, you can use that same code with the same dataset, and learn more stuff building on that

#

by building up on concepts you learn it makes it a lot easier than just learning stuff in order

exotic maple Mar 23, 2021, 3:15 PM

#

Literally I love @austere swift 's approach

#

personally i set myself 3 goals, in increasing difficulty :

grave frost Mar 23, 2021, 3:16 PM

#

austere swift so like if you start off with something like "i want to be able to visualize thi...

visualizing can be an interesting project too - say you have real-world data of the amount of time kids study and what their behavior is. then visualizing is a great project because then I would be genuinely interested in whether kids who study more have a higher chance of depression or not.

you can do almost everything better and learn faster too if you have the motivation 🙂

exotic maple Mar 23, 2021, 3:16 PM

#

visualization project with python
ML application, simple with python (predict something with an ML model)
More complex, CV application with CNN

#

and im basing my learning on that

#

mostly

grave frost Mar 23, 2021, 3:18 PM

#

I used to do personal projects for learning all the basics - now I have reduced those (because I can't manage the time very well) but I find competitions much more encouraging to explore experimental techniques and somehow apply them to increase my LB score.

hidden cove Mar 23, 2021, 3:18 PM

#

Hello Evervybody , I have a question , how to calculate the average of a signal please ??

grave frost Mar 23, 2021, 3:18 PM

#

That's why I encourage beginners to do those simple kaggle competitions (one which have a monthly LB) to learn more

hidden cove Mar 23, 2021, 3:19 PM

#

the statement tells me: create a function that evaluates the average of a signal

austere swift Mar 23, 2021, 3:19 PM

#

grave frost I used to do personal projects for learning all the basics - now I have reduced ...

yeah whenever i wanna learn something i just start a project on it

grave frost Mar 23, 2021, 3:19 PM

#

austere swift yeah whenever i wanna learn something i just start a project on it

agreed. its the most fun way 😁

austere swift Mar 23, 2021, 3:21 PM

#

yeah its a lot more interesting than learning it from somewhere online since you can actually see the results of what you did

#

and satisfying

exotic maple Mar 23, 2021, 3:26 PM

#

grave frost That's why I encourage beginners to do those simple kaggle competitions (one whi...

LB?

grave frost Mar 23, 2021, 3:26 PM

#

exotic maple LB?

Leaderboard - usually referred to the Leaderboard score

exotic maple Mar 23, 2021, 3:26 PM

#

oh

#

@grave frost answering your question. It might not be that i dislike NLP, but mostly that i'm just pissed off at the awful quality of teaching ive had of it so far lol

#

so i'll probably have to relearn it from scratch if i ever use it

grave frost Mar 23, 2021, 3:27 PM

#

exotic maple <@!738058085083381760> answering your question. It might not be that i dislike N...

how bad?

#

(on a scale of 1-10)

exotic maple Mar 23, 2021, 3:28 PM

#

2

grave frost Mar 23, 2021, 3:28 PM

#

exotic maple 2

meaning your teacher was drunk?

exotic maple Mar 23, 2021, 3:28 PM

#

awful explanation, the lecturer was as stimulating as a political speech and his explanations and analogies were shit and there was very little code or matha long

#

mgiht as well read wikipedia to learn it

grave frost Mar 23, 2021, 3:29 PM

#

exotic maple mgiht as well read wikipedia to learn it

haha lol

exotic maple Mar 23, 2021, 3:29 PM

#

and honestly im a bit burnout too i think lol

grave frost Mar 23, 2021, 3:29 PM

#

sad, NLP is kinda interesting. though my interest in AI has been dwindling somewhat lately

exotic maple Mar 23, 2021, 3:31 PM

#

I hate it because i loved the ML part and i was actually excited about everything i did, even if i struggled with seemingly basic stuff someties

#

sometimes

#

so i went to NLP excited, but this guy killed me in a week lmao

light stump Mar 23, 2021, 3:38 PM

#

does anyone here have some experience with skimage.transform module?

#

i'm trying to implement either PolynomialTransform().estimate or PiecewiseAffineTransform().estimate and i'm getting errors that idk how to deal with properly

#

but any experience at all with skimage.transform would be helpful

exotic maple Mar 23, 2021, 3:43 PM

#

@grave frost is this what yumeant?

#

https://www.kaggle.com/c/tabular-playground-series-mar-2021

Tabular Playground Series - Mar 2021

Practice your ML skills on this approachable dataset!

grave frost Mar 23, 2021, 3:43 PM

#

for what?

exotic maple Mar 23, 2021, 3:47 PM

#

the competitions

#

leaderboard

grave frost Mar 23, 2021, 3:51 PM

#

yeah, that's the LB - the top rankers in a competetion

edgy edge Mar 23, 2021, 3:51 PM

#

Hello guys

#

How is the job thing in US concerning Data Science

lapis sequoia Mar 23, 2021, 3:54 PM

#

Can I ask here a pandas question?

exotic maple Mar 23, 2021, 3:54 PM

#

lapis sequoia Can I ask here a pandas question?

ask away

edgy edge Mar 23, 2021, 3:54 PM

#

lapis sequoia Can I ask here a pandas question?

Yeah, what is it?

lapis sequoia Mar 23, 2021, 3:55 PM

#

df['invoicepayed'] = df['invoicepayed'].replace(['\N'],np.nan)

#

will replace \N with NaN in my dataframe, seems to work okay

#

if df['invoicepayed'].notnull():
pd.to_datetime(df['invoicepayed'], format='%Y-%m-%d %H:%M:%S')

#

format works for not NaN and without the if-statement

#

so with if ... I try to convert only for not NaN

#

I get

#

"The truth value of a {0} is ambiguous. "

#

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

fast plover Mar 23, 2021, 3:58 PM

#

Quick pandas/numpy question: I want to get the mean of the bottom 10% of values in a series. How do I do this?

lapis sequoia Mar 23, 2021, 3:58 PM

#

I checked stackoverflow found some posts but I seem not to understand the underlying problem

exotic maple Mar 23, 2021, 4:04 PM

#

df['invoicepayed'].notnull()

#

this is not what you think it is

#

that returns a series

#

where each value in the column is evaluated to be NAN or not NAN

#

so if you do if df['invoicepayed'].notnull():
you get an error because you are saying

#

"Is Series True?"

#

and python cant answer that

#

@lapis sequoia you're not comparing each object inside the series, you're comparing the series itself there

abstract zealot Mar 23, 2021, 4:06 PM

#

Hi basically working with data frames with excess of 1 million rows and I’m using pd.groupby, specifically
‘’’py
For (a,b), c in df.groupby(by=[‘col1’, ‘col2’])
‘’’
I’ve noticed this is very very slow and was wondering if anyone had any suggestions for improvement? I’ve tried itertools groupby which slightly improved times, but I think because the column consists of strings maybe somehow converting the columns to an integer value might speed things up ? I have no idea but would love to try some of your guys suggestions 🙂

exotic maple Mar 23, 2021, 4:07 PM

#

a million rows shouldnt that much of a problem for pandas @abstract zealot can you share a screenshot of your df?

lapis sequoia Mar 23, 2021, 4:07 PM

#

Ok. So I need to check for every element in the series not the series itself.

exotic maple Mar 23, 2021, 4:07 PM

#

lapis sequoia Ok. So I need to check for every element in the series not the series itself.

in theory, but check what pandas datetime does to nan values BEFORE trying to apply logic there

#

if datetime ignores nans, then you can just use it

abstract zealot Mar 23, 2021, 4:07 PM

#

Sorry my bad it’s 25 million you made me recheck hahaha

lapis sequoia Mar 23, 2021, 4:09 PM

#

oh.... I played in my jupyter notebook and didn't notice.

misty flint Mar 23, 2021, 4:10 PM

#

thats a lot of data

lapis sequoia Mar 23, 2021, 4:10 PM

#

exotic maple if datetime ignores nans, then you can just use it

Thank you very much.

misty flint Mar 23, 2021, 4:11 PM

#

itll probably take a long time regardless unless you have access to more processing power

exotic maple Mar 23, 2021, 4:12 PM

#

abstract zealot Hi basically working with data frames with excess of 1 million rows and I’m usin...

im not sure using a for loop is a good idea there

#

thats probably what's slowing you down

#

why are you looping to do a groupby anyways lol

misty flint Mar 23, 2021, 4:12 PM

#

lol

abstract zealot Mar 23, 2021, 4:14 PM

#

@exotic maple are there any better alternatives to groupby? I unfortunately need to do calculations on Sub data frames returned by groupby for certain values

exotic maple Mar 23, 2021, 4:14 PM

#

you cna try just aggrating or passing a custom function

#

IF i get you right

#

for example, you want the SUM of N values of a row in a groupby

#

df.groupby("RELEVANT GROUP").agg({"COLUMN TO SUMMARIZE": SUMMARY FUNCTION)

abstract zealot Mar 23, 2021, 4:15 PM

#

I’ll definitely try something like this and let you know thank you very much man

exotic maple Mar 23, 2021, 4:16 PM

#

you can also try it via pivot tables

#

but i found pivot tables in pandas...odd. i prfer grouping manually lol

abstract zealot Mar 23, 2021, 4:30 PM

#

Another quick question @exotic maple what if in addition to grouping, I wanted to only look at the data frames in intervals from rows 0-20, 20-40, 40-60 etc

#

Is this possible with the method you describe?

exotic maple Mar 23, 2021, 4:32 PM

#

You can by grouping by partitions?

#

You mean?

abstract zealot Mar 23, 2021, 4:32 PM

#

I think so yes

exotic maple Mar 23, 2021, 4:33 PM

#

Eh ive never done that but i think you can.

Id try this.
Df["splits"] = Pd.cut(df,5)
This will create 5 equally distinta values for splitting

Then id use groupby by that column

#

Im sure theres a better way but i dont have a sample df nor energy right now lol

abstract zealot Mar 23, 2021, 4:33 PM

#

Jahahaha no problem thank you very much again

short heart Mar 23, 2021, 4:35 PM

#

ValueError: Input 0 of layer sequential is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: (None, 60)

#

any ideas how to fix that?

tidal bronze Mar 23, 2021, 4:47 PM

#

how can I save cluster mappings of Kmeans aggregation to a dictinary?

#

I used sklearn if it matters

fading kernel Mar 23, 2021, 5:11 PM

#

@tidal bronze test this: https://stackoverflow.com/questions/60858780/dict-of-cluster-and-partition-with-kmeans-python

Stack Overflow

Dict of cluster and partition with kmeans python

I search a solution to my problem.

I use Kmeans by sklearn and i want a dictionary with { cluster : list of partition}

kmeans = KMeans(n_clusters=n)
kmeans.fit(data)

result = zip(data,kmeans.lab...

tidal bronze Mar 23, 2021, 5:15 PM

#

clust_map = dict(zip(agg_df.index, agg_df["an_vol_cluster"]))

I used this in the end which seems to be quite similar to waht you are suggesting 😄

#

thanks anyway @fading kernel

short heart Mar 23, 2021, 5:21 PM

#

How do i put a 2d array into sequential in keras

#

Or how do i reshape it

short heart Mar 23, 2021, 5:46 PM

#

model = Sequential()
model.add(LSTM(4,input_shape=(940,60),return_sequences=True))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(X_train, y_train, epochs = 100, batch_size = 32)

grave frost Mar 23, 2021, 5:50 PM

#

input?

short heart Mar 23, 2021, 5:50 PM

#

an array of minmaxed values

#

give me a sec

#


[[1.98211310e-02 2.13912644e-02 2.05622164e-02 ... 7.13034744e-02
  8.30816740e-02 8.40888464e-02]
 [2.13912644e-02 2.05622164e-02 2.05600173e-02 ... 8.30816740e-02
  8.40888464e-02 8.43153503e-02]
 [2.05622164e-02 2.05600173e-02 2.06237903e-02 ... 8.40888464e-02
  8.43153503e-02 8.45660438e-02]
 ...
 [5.82092875e-05 4.71699742e-05 4.45750758e-05 ... 1.26729997e-03
  1.00240043e-03 1.10061074e-03]
 [4.71699742e-05 4.45750758e-05 4.63343289e-05 ... 1.00240043e-03
  1.10061074e-03 1.38178337e-03]
 [4.45750758e-05 4.63343289e-05 4.59165063e-05 ... 1.10061074e-03
  1.38178337e-03 1.59278379e-03]] ```

#

looks something like this

grave frost Mar 23, 2021, 5:51 PM

#

I mean using the input layer

#

tf.keras.layer.Input

#

and there is also a reshape layer https://www.tensorflow.org/api_docs/python/tf/keras/layers/Reshape

TensorFlow

tf.keras.layers.Reshape | TensorFlow Core v2.4.1

Layer that reshapes inputs into the given shape.

short heart Mar 23, 2021, 5:59 PM

#

idk

#

im having a brain fart at this point

#

i tried reshaping it like this X_train = np.reshape(X_train, (940, 60, 1)) which worked before btw, but now it gives out this error ValueError: total size of new array must be unchanged, input_shape = [60, 1], output_shape = [940, 60]

abstract zealot Mar 23, 2021, 6:34 PM

#

@exotic maple your suggestion works! As does using a lambda function in the groupby 🙂

short heart Mar 23, 2021, 6:56 PM

#

short heart i tried reshaping it like this ```X_train = np.reshape(X_train, (940, 60, 1))```...

So, any ideas how to fix that?

surreal radish Mar 23, 2021, 6:59 PM

#

short heart So, any ideas how to fix that?

what is your array length ?

short heart Mar 23, 2021, 6:59 PM

#

60

#

Should be 60

surreal radish Mar 23, 2021, 6:59 PM

#

940 * 60 is not 60 !!!!

short heart Mar 23, 2021, 7:00 PM

#

oh u meant that

#

Well yeah, so what?

surreal radish Mar 23, 2021, 7:00 PM

#

you can use this for the second argument of reshape :

#

(1,len(X_train))

#

is it work !?

short heart Mar 23, 2021, 7:01 PM

#

but its 940,60…

surreal radish Mar 23, 2021, 7:02 PM

#

you want 3D array ?

short heart Mar 23, 2021, 7:03 PM

#

Yes, i do

#

For a sequential in keras

surreal radish Mar 23, 2021, 7:04 PM

#

as i know the multiplication of Dimensions should equal your array length

short heart Mar 23, 2021, 7:04 PM

#

Wdym

surreal radish Mar 23, 2021, 7:05 PM

#

i mean multiplication of values in second argument of reshape

short heart Mar 23, 2021, 7:05 PM

#

So what u r saying is multiplication of second arguments should equal 3?

surreal radish Mar 23, 2021, 7:05 PM

#

no no

surreal radish Mar 23, 2021, 7:05 PM

#

surreal radish i mean multiplication of values in second argument of reshape

i mean this

short heart Mar 23, 2021, 7:06 PM

#

Just give an example...

#

I dont think it solves the problem though

misty flint Mar 23, 2021, 7:07 PM

#

i did matrix multiplication by hand the other day

surreal radish Mar 23, 2021, 7:07 PM

#

for example you have array with length 12 okay ? if you want to reshape it to any Dimensions the multiplication of numbers should be 12

misty flint Mar 23, 2021, 7:07 PM

#

tldr: it was not fun

#

DoggoKek

lucid ferry Mar 23, 2021, 7:07 PM

#

Hey,
I have this def that takes a string and a DataFrame as arguments.
def accuracy_by_species(specie_name, df):

Now, I want to use apply function and pass DataFrame as argument. Is this possible?
Something like:
['a', 'b', 'c'].apply(accuracy_by_species, MyDF)

Any help would be appreciated.

surreal radish Mar 23, 2021, 7:08 PM

#

surreal radish for example you have array with length 12 okay ? if you want to reshape it to an...

for example (2,3,2) or (1,12) or (3,4,1) or ... is true

short heart Mar 23, 2021, 7:08 PM

#

R u sure it works that way

surreal radish Mar 23, 2021, 7:09 PM

#

yes becouse it is a rule you can check it in documents of numpy

exotic maple Mar 23, 2021, 7:16 PM

#

abstract zealot <@263491859173736449> your suggestion works! As does using a lambda function in ...

glda to be of help man

red yew Mar 23, 2021, 7:27 PM

#

Hey there, I have a pretty common task that I struggle with in python, usually trying to use pandas and matplotlib. I've seen guides online for similar things, but never quite this issue, which I'd think is very common:

I have a series of discrete events broken up into timestamps, like "message sent at timestamp x", and some 10,000 of those. The timestamps span maybe 13 months. All I want to do is bin that data into days, so like "10 messages received on Jan 1st, 12 on Jan 2nd, 7 on Jan 3rd," etc. I'd like to see it on a graph, showing the number of events per day over time, to see trends. I've already converted the timestamps into epoch time (seconds since 1970) and have it in CSV form and as a dataframe in pandas.

Anyone know how I can do this?

grave frost Mar 23, 2021, 7:31 PM

#

misty flint i did matrix multiplication by hand the other day

how is that bad?

exotic maple Mar 23, 2021, 7:31 PM

#

red yew Hey there, I have a pretty common task that I struggle with in python, usually t...

well, first of all

conver the timestamps to a format understood by pandas.
extract the date only portion of the timestamp
perform groupby and aggregate sum

grave frost Mar 23, 2021, 7:33 PM

#

red yew Hey there, I have a pretty common task that I struggle with in python, usually t...

do you have to compulsorily use pandas? it could be easier just a simple iteration + slicing/splitting

red yew Mar 23, 2021, 7:35 PM

#

exotic maple well, first of all 1. conver the timestamps to a format understood by pandas. 2....

#1 and #2 I think are done, as long as I'm telling pandas to read the seconds since 1970 properly
#3 What would I be grouping/aggregating by? I see examples like "number of events per day of week" and such, but not just a graph of events over time summed by day

red yew Mar 23, 2021, 7:35 PM

#

grave frost do you have to compulsorily use pandas? it could be easier just a simple iterati...

I don't need pandas, I'm just very unfamiliar with how to properly plot things in Python

exotic maple Mar 23, 2021, 7:36 PM

#

if you have your days only, you can use

misty flint Mar 23, 2021, 7:36 PM

#

grave frost how is that bad?

just took a lot of time

red yew Mar 23, 2021, 7:36 PM

#

do you mean basically just count 24 hour periods from a start date to an end date, iterating over my sorted data and counting manually in a loop?

exotic maple Mar 23, 2021, 7:36 PM

#

df.groupby("date") -> this will groupby the table by the UNIQUE values of the grouping column

#

then, you can cast an aggregation function

misty flint Mar 23, 2021, 7:36 PM

#

numpy can do it in like 1/100th of the time

#

DoggoKek

exotic maple Mar 23, 2021, 7:36 PM

#

in your case you simply want sum so

grave frost Mar 23, 2021, 7:37 PM

#

misty flint just took a lot of time

its easy for small ones

exotic maple Mar 23, 2021, 7:37 PM

#

df.groupby("date").agg({"messages":sum})

misty flint Mar 23, 2021, 7:37 PM

#

...i didnt say it was not easy

#

memecringeharold

exotic maple Mar 23, 2021, 7:37 PM

#

you can also do it via pivot table, since pivot table is literally a grouping function as well

grave frost Mar 23, 2021, 7:37 PM

#

red yew I don't need pandas, I'm just very unfamiliar with how to properly plot things i...

ahh...you can then simply use split. can you show a sample of your data?

red yew Mar 23, 2021, 7:37 PM

#

Sure, just give me a minute to upload it

exotic maple Mar 23, 2021, 7:38 PM

#

grave frost ahh...you can then simply use split. can you show a sample of your data?

wouldnt taht create many dfs?

#

the way i got it he only wants grouped sums

grave frost Mar 23, 2021, 7:38 PM

#

exotic maple wouldnt taht create many dfs?

normal string split

exotic maple Mar 23, 2021, 7:38 PM

#

grave frost normal string split

ah lol

#

isnt it easier to convert to timestma and extract day?

grave frost Mar 23, 2021, 7:38 PM

#

it depends

exotic maple Mar 23, 2021, 7:38 PM

#

that's the approach id use

grave frost Mar 23, 2021, 7:39 PM

#

I prefer the shortest route however dirty it might be 😁

exotic maple Mar 23, 2021, 7:39 PM

#

grave frost I prefer the shortest route however dirty it might be 😁

ah, a fellow lazyness connoiseur (or wtf that french thing is spelled)

grave frost Mar 23, 2021, 7:40 PM

#

I had a problem to solve to get the mean of nested arrays and this time I decided to do it properly with a class since it would be re-used. took me an hour

red yew Mar 23, 2021, 7:40 PM

#

@grave frost https://pastebin.com/aPgXA1DE

Pastebin

epoch|author1593316925.431|user11593316968.259|user11593316979.211|...

Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.

grave frost Mar 23, 2021, 7:40 PM

#

just to write this piece of shit:

class BPE():
  def bpe_embed(self, arg):
    vec = []

    for _ in arg:
      vec.append(bpemb_ny.embed(_))
    
    return self.averager(vec)

  def averager(self, sentence_vec):
    averaged_vec = []

    for j in sentence_vec:
      for k in j:
        averaged_vec.append(k)
      
    avg = np.mean(averaged_vec, axis=0)
    #print("avg:", avg)
    return avg
  
  def final(self, arg):
    final = []
    for stanza in tqdm(arg):
      final.append(self.bpe_embed(stanza))

    print(final)
    return np.array(final)

grave frost Mar 23, 2021, 7:41 PM

#

red yew <@!738058085083381760> https://pastebin.com/aPgXA1DE

so you just want to extract the epoch right?

#

!e

a_string = '1593316925.431|user1'
print(a_string.split('|')[0])

arctic wedgeBOT Mar 23, 2021, 7:42 PM

#

You are not allowed to use that command here. Please use the #bot-commands channel instead.

red yew Mar 23, 2021, 7:42 PM

#

grave frost so you just want to extract the epoch right?

oh no, I mean, I'm already reading the CSV file in and can manipulate the data as I want. I just want to go to the next step, to produce a graph that shows me the events over time, per day. So like "10 events on this day"

#

I can see programatically sorting it and then iterating over it in 24 hour chunks, creating a new table that way

#

I've also used a pivot table for this in the past somehow

grave frost Mar 23, 2021, 7:43 PM

#

how does epoch and user correlate with day and events?

red yew Mar 23, 2021, 7:44 PM

#

each timestamp is an event that occurs on some day in time. The user column is mostly useless for this case, it's already filtered by user

grave frost Mar 23, 2021, 7:44 PM

#

and the number at the start?

red yew Mar 23, 2021, 7:45 PM

#

epoch|author is the row that defines the headers
1593316925.431|user1 is the first row of data, with the first column being an event that occurred on June 28th, 2020, 4:02:05 AM UTC, by user1

grave frost Mar 23, 2021, 7:46 PM

#

aight. so then just extract the day from each entry, put it in a list and then use matplotlib. what do you find difficulty in?

#

you can use count to then count the number of times it appears

red yew Mar 23, 2021, 7:47 PM

#

I guess I was expecting that matplotlib or pandas would have a function like hist() or something that would automatically know how to do this

#

I can extract the day and count programmatically, then plot that

#

I guess simpler is better. I was hoping for fanciness

grave frost Mar 23, 2021, 7:48 PM

#

I guess there might be some function 🤷 but I don't know

#

tho tbh you might be suprised to do so many common things, we do not have functions (or libs) for it

red yew Mar 23, 2021, 7:50 PM

#

could numpy help me perhaps? I'll need to sort the data by that column, find the min and max to get the date range, and then iterate through the data to sum up the counts per day. basically binning it manually

tidal bough Mar 23, 2021, 7:54 PM

#

red yew could numpy help me perhaps? I'll need to sort the data by that column, find the...

Note: if you end up implementing this manually, @numba.njit that function and you'll likely get pretty acceptable speeds.

#

as for a numpy solution, hmm.

exotic maple Mar 23, 2021, 7:55 PM

#

seaborn as hist

#

and pandas has

#

pd.plot.hist

#

i think

tidal bough Mar 23, 2021, 7:56 PM

#

what do you need exactly? Plot the counts of events per day? That seems like a histogram with fixed bin edges to me - if so, you can just use plt.hist or np.hist.

red yew Mar 23, 2021, 7:56 PM

#

it's a one-time processing of, worst case, 310,000 rows, so I don't really mind the processing time

exotic maple Mar 23, 2021, 7:56 PM

#

in fact, matplotlib has histograms too...

#

https://matplotlib.org/stable/gallery/statistics/hist.html

tidal bough Mar 23, 2021, 7:56 PM

#

matplotlib's hist calls np's hist, even.

red yew Mar 23, 2021, 7:56 PM

#

tidal bough what do you need exactly? Plot the counts of events per day? That seems like a h...

it spans over a year, so would a histogram work? Like it'd need to bin hours into a day, for maybe 400 days

exotic maple Mar 23, 2021, 7:56 PM

#

red yew it spans over a year, so would a histogram work? Like it'd need to bin hours int...

nah that would be awful

#

try weeks

tidal bough Mar 23, 2021, 7:56 PM

#

pretty much; you might just need to manually generate the bin edges

exotic maple Mar 23, 2021, 7:57 PM

#

you can try getting "week of year" (a number from 1 to 52) and generate a histogram from there

tidal bough Mar 23, 2021, 7:57 PM

#

but that's a pretty small function, comparatively; it's only 400 numbers

exotic maple Mar 23, 2021, 7:57 PM

#

by day is awful

#

you wont be able to read it

red yew Mar 23, 2021, 7:58 PM

#

my goal is to see a trend over time, where counts per day would give me a good indicator of daily activity that may fluctuate over time. Like, picture wanting to do analytics on a website where you see hits per day from one user

exotic maple Mar 23, 2021, 7:58 PM

#

tidal bough matplotlib's `hist` calls `np`'s `hist`, even.

wait, its refences all th way down?

red yew Mar 23, 2021, 7:58 PM

#

basically I'm doing a transformation of an unordered list of discrete events into a summation of hits per day

#

then plotting that

tidal bough Mar 23, 2021, 7:59 PM

#

I see. So yeah, that's just a histogram.

red yew Mar 23, 2021, 7:59 PM

#

ah

#

I've looked up guides on histograms but not found something clear about this specific thing

exotic maple Mar 23, 2021, 7:59 PM

#

@red yew try this

separate epoch from author as awesome told you
convert the epoch to a readable dt format
extract day / week from dt format
create histogram

red yew Mar 23, 2021, 7:59 PM

#

like I've tried this:

df = pd.read_csv('output.csv', header=0, delimiter='|', quotechar='^', quoting=csv.QUOTE_MINIMAL)
fig, ax = plt.subplots()
df["timestamp"].astype(np.int64).plot.hist(ax=ax)
labels = ax.get_xticks().tolist()
labels = pd.to_datetime(labels)
ax.set_xticklabels(labels, rotation=90)

plt.show()

but it resulted in a graph without sufficient bins

tidal bough Mar 23, 2021, 8:00 PM

#

As an example, just plt.hist it. The results will be horrible because it will by default choose like 10-20 evenly sized bins, but it should work. To make it right, pass the bins argument to it.

red yew Mar 23, 2021, 8:00 PM

#

exotic maple <@!436345744283402265> try this 1. separate epoch from author as awesome told yo...

ye that's the road I'm going down, though #4 I don't know how to do yet, or really #3 and how to store that data in a df

#

ah hm, let me see the b ins arg

tidal bough Mar 23, 2021, 8:00 PM

#

but it resulted in a graph without sufficient bins
yup, precisely, you'd have to specify their count and maybe also the precise positions.

red yew Mar 23, 2021, 8:00 PM

#

current output

exotic maple Mar 23, 2021, 8:00 PM

#

red yew current output

i mean...thats what you want

#

its just the x axis is wrong

#

because you are using epochs and not human dates

red yew Mar 23, 2021, 8:01 PM

#

can I format the x-axis with a function that knows how to handle the epoch? Or do I need to change the input format

exotic maple Mar 23, 2021, 8:01 PM

#

i'd choose changing input format, much cleaner

#

and reproducible

#

but thats up to you

#

labels = pd.to_datetime(labels) this part of your code doesnt seemt o be working

red yew Mar 23, 2021, 8:02 PM

#

I can put it into ISO8601 or something. I'll need to lookup how matplotlib handles datetimes I suppose

#

o

exotic maple Mar 23, 2021, 8:02 PM

#

ISO8601 my man here with ISO format

#

-hugs-

red yew Mar 23, 2021, 8:03 PM

#

we are nothing without standards

#

I don't know how to convert a column of data that's in a dataframe. I assume there's a transformation function that can be applied over it.

#

bins=365 improves things already

exotic maple Mar 23, 2021, 8:03 PM

#

pd.todatetime or whatever the hell is spelled

#

buuuuuuuuuuuut

#

im not sure if datetime converst epochs

tidal bough Mar 23, 2021, 8:04 PM

#

should be possible to at least convert it to numpy's datetime type

red yew Mar 23, 2021, 8:04 PM

#

hmm to_datetime takes an strftime format string, hm

exotic maple Mar 23, 2021, 8:04 PM

#

https://exceptionshub.com/python-convert-unix-epoch-time-to-datetime-in-pandas.html

ExceptionsHub

admin

python – Convert Unix epoch time to datetime in Pandas

Questions: I have been searching for a long time to find a solution to my prob...

#

once you know what you want, its easy to google it 😉

#

#

pandas was literally built to handle annoying datetime stuff lol id be surprised if it didnt handle it

#

now, being a bit more..."scientific" why you are looking at that trend like that plasma? I think a more interesting observation could be:
1-) messages by day of weeek.
2-) seasonality of messges by month / day / week, etc
daily change itself doesnt seem too valuable to me there

#

in fact, your data has some very noticeable spikes, so there seems to be somehting there

red yew Mar 23, 2021, 8:08 PM

#

exotic maple now, being a bit more..."scientific" why you are looking at that trend like that...

in this particular case I'm going to correlate the trend over time with events that have happened at discrete times

exotic maple Mar 23, 2021, 8:08 PM

#

see, the spikes i mentioned :p

red yew Mar 23, 2021, 8:08 PM

#

like ideally, a vertical line in the graph with a label indicating what happened on that day

#

again, if it were a website, picture "sale on this day"

exotic maple Mar 23, 2021, 8:08 PM

#

you could add the following

#

compute the mean per day

#

and make a single horizontal line

#

to display it

#

and then color all the bars n-stds away from the mean

red yew Mar 23, 2021, 8:09 PM

#

that'd be neat

exotic maple Mar 23, 2021, 8:09 PM

#

id like to see your data plotted as a normal distribution

#

tbf it seems like it COULD approximate it

red yew Mar 23, 2021, 8:09 PM

#

I imagine there'd be some trends in day-of-week, just not interesting in my case

#

a running weekly average would be interesting too

#

spikes here probably line up with weekends

exotic maple Mar 23, 2021, 8:10 PM

#

what I'm trying to say is: mark the mean. mark 1 std deviation above and below the mean

#

and color the bars ABOVE the 1-std differently

red yew Mar 23, 2021, 8:10 PM

#

I'd like to figure that out as an experiment, sure. It'd be neat to see

exotic maple Mar 23, 2021, 8:10 PM

#

that's pretty easy to plot :p

#

and it can visually display your idea of "something different happene dhere"

red yew Mar 23, 2021, 8:10 PM

#

I have no idea how to do that currently. both pandas and matplotlib are opaque to me, and most docs seem to be just SO questions/answers, or very verbose API references

twin moth Mar 23, 2021, 8:11 PM

#

Any idea which of those is better in order to count duplicates in a dataframe?

def count_duplicatives(df, col_name=None):
    return df.duplicated(col_name or df.columns.tolist()).sum()

def count_duplicatives(df, col_name=None):
    return df[df.duplicated(col_name or df.columns.tolist())].shape[0]

exotic maple Mar 23, 2021, 8:11 PM

#

you can cast np.mean(df["value"] on the column that holds your values

red yew Mar 23, 2021, 8:11 PM

#

I think I'll start with trying to fix these x-axis labels (still trying), and then drawing vertical lines for important events

red yew Mar 23, 2021, 8:11 PM

#

exotic maple you can cast np.mean(df["value"] on the column that holds your values

in this case my values are computed histograms...Do I have access to that generated data?

exotic maple Mar 23, 2021, 8:11 PM

#

https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.annotate.html#matplotlib.pyplot.annotate

#

use this

red yew Mar 23, 2021, 8:11 PM

#

thanks

exotic maple Mar 23, 2021, 8:11 PM

#

instead of verical bars

#

#

thats what you want no?

#

use annotate, much cleaner

red yew Mar 23, 2021, 8:12 PM

#

basically yea, and it sounds like the coordinate system is the x-axis by default, so I can specify an epoch time of the event

#

which I can figure out

exotic maple Mar 23, 2021, 8:13 PM

#

matplotlib is really cool but a massive pain in the ...

red yew Mar 23, 2021, 8:13 PM

#

so Iv'e gathered! Do you have a preferred plotting lib?

#

I saw pyplot but it seemed very focused on web-based notebooks

exotic maple Mar 23, 2021, 8:13 PM

#

try seaborn

#

https://seaborn.pydata.org/tutorial/distributions.html

#

its prettier

#

and abstracts a lot of stuff you dont want

red yew Mar 23, 2021, 8:13 PM

#

cool

exotic maple Mar 23, 2021, 8:14 PM

#

and you can still reference matplotlib objects

#

since seaborn inherits matplotlib

#

#

thats the kind of plot that id like to see in your data. kde for values basically

#

if its normally or normal-like distributed, you can easily find outlier matematically by declaring

#

Z scores

#

(how many standard devs is the value away from the mean)

red yew Mar 23, 2021, 8:16 PM

#

but what if the outlier is a trend over time? Like "the user slowly stopped using this service over a period of 1 month"

#

one could calculate the weekly frequency of usage

exotic maple Mar 23, 2021, 8:16 PM

#

that's different. I would have to think it over

#

but thats not related to population

#

but to one user

#

so you'd have to compute it separately

#

i shoould be working on NLP but im findng your data more interesting lmao

#

I guess i like the intersection of marketng, analytics :p

red yew Mar 23, 2021, 8:17 PM

#

haha. I think I'd rather be working on NLP

exotic maple Mar 23, 2021, 8:19 PM

#

twin moth Any idea which of those is better in order to count duplicates in a dataframe? ...

uh. I'm 99% sure pandas already has a method

#

@twin moth https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.duplicated.html

#

people reall need to check documentaiton more often

#

xD

red yew Mar 23, 2021, 8:24 PM

#

speaking of documentation, I'm trying to find out just what subplots() does and how I can go from this default bar graph to a connected line graph

#

and then I can have multiple lines indicating different users

tidal bough Mar 23, 2021, 8:26 PM

#

subplots is for several subfigures on one figure, basically

#

if you want to plot more than one plot on a figure (like, several lines), this is as simple as plotting them all between getting a new figure and showing it

red yew Mar 23, 2021, 8:27 PM

#

I wonder why the example I used had me use it. Maybe so I can control ax

#

o hm

tidal bough Mar 23, 2021, 8:27 PM

#

plt.figure()
plt.plot(...)
plt.plot(...)
plt.plot(...)
plt.show()

red yew Mar 23, 2021, 8:27 PM

#

#!/usr/bin/python3

import pandas as pd
import csv
import matplotlib.pyplot as plt
import numpy as np

df = pd.read_csv('output.csv', header=0, delimiter='|', quotechar='^', quoting=csv.QUOTE_MINIMAL)
df['timestamps'] = pd.to_datetime(df['timestamp'], unit='s')
fig, ax = plt.subplots()
df["timestamp"].astype(np.int64).plot.hist(ax=ax, bins=75)
labels = ax.get_xticks().tolist()
labels = pd.to_datetime(labels)
ax.set_xticklabels(labels, rotation=90)

plt.annotate("event 1", (1611925200, 100), color='r')
plt.axvline(x=1611925200, color='r')

plt.show()

in my case I'm not even sure why it's creating a bar graph

#

I assume it's being set to that by plt.subplots()

tidal bough Mar 23, 2021, 8:28 PM

#

bar graph?

red yew Mar 23, 2021, 8:28 PM

#

ye, currently looks like this

tidal bough Mar 23, 2021, 8:28 PM

#

uhh

#

so, just a histogram? I don't see how it's different from what you had before.

red yew Mar 23, 2021, 8:29 PM

#

it is, but, can the histogram data be displayed as a line graph?

twin moth Mar 23, 2021, 8:29 PM

#

exotic maple uh. I'm 99% sure pandas already has a method

That's literally the method I'm using...

exotic maple Mar 23, 2021, 8:29 PM

#

twin moth That's literally the method I'm using...

so you want to count them?

twin moth Mar 23, 2021, 8:29 PM

#

exotic maple so you want to count them?

Indeed

exotic maple Mar 23, 2021, 8:29 PM

#

you can just count the False insances then

#

actually

tidal bough Mar 23, 2021, 8:30 PM

#

red yew it is, but, can the histogram data be displayed as a line graph?

can the histogram data be displayed as a line graph?
~~So, like, same thing but horizontally?~~
Oh, I got what you mean. You'd need to use np.hist instead to give you the raw data; then plot it with just plt.plot.

twin moth Mar 23, 2021, 8:30 PM

#

The Trues and that's what I did kinda

red yew Mar 23, 2021, 8:30 PM

#

tidal bough > can the histogram data be displayed as a line graph? ~~So, like, same thing bu...

ahh, I see

exotic maple Mar 23, 2021, 8:30 PM

#

since True has value 1. You cando
len(df) - sum(column)

tidal bough Mar 23, 2021, 8:30 PM

#

plt.hist calls np.hist, so their arguments are pretty much the same.

exotic maple Mar 23, 2021, 8:30 PM

#

twin moth The `True`s and that's what I did kinda

since True has value 1. You cando
len(df) - sum(column)

#

try that

#

or something similar

#

basically

#

Trues are 1

twin moth Mar 23, 2021, 8:31 PM

#

exotic maple since True has value 1. You cando len(df) - sum(column)

Both of my codes work... I'm just asking what's better

exotic maple Mar 23, 2021, 8:31 PM

#

if you discountr their sum from the lenght of the rows, you get the Falses

twin moth Mar 23, 2021, 8:31 PM

#

I did exactly that by using .sum()

exotic maple Mar 23, 2021, 8:31 PM

#

shape should be faster but does it give you the same result?

twin moth Mar 23, 2021, 8:31 PM

#

Both work exactly the same

#

I get that generally shape is faster

#

But is it really faster in this implementation even though I create a whole new DF just for that?

exotic maple Mar 23, 2021, 8:33 PM

#

unfortunately i cant answer that confidently

#

so id rather not misinform you

#

time them

#

and if shape is faster, it is

twin moth Mar 23, 2021, 8:44 PM

#

Thanks 🙂

#

for col in df.select_dtypes(exclude=['int64','float64']):
    most_common = df[col].mode()[0]
    df[col].fillna(most_common, inplace=True)

How would you guys achieve that?

#

Gives me the following error A value is trying to be set on a copy of a slice from a DataFrame

woven pumice Mar 23, 2021, 9:21 PM

#

The SettingWithCopyWarning should just be a warning not an error, and I am actually able to run your code without issue. Another method would be
df.loc[:,col] = df[col].fillna(most_common). Also, using scikit-learn's Imputer with strategy='most_frequent' with may be a more effective way of filling missing data in preprocessing

vestal bough Mar 23, 2021, 9:24 PM

#

I was wandering about using Kmeans clustering for multidimensial dataset.
Being a geometrical method, how i can be sure that clustering has been made correctly? (Not having visualization feedback)

twin moth Mar 23, 2021, 9:44 PM

#

woven pumice The SettingWithCopyWarning should just be a warning not an error, and I am actua...

I got that working:

    most_common = 9999999
    new_df[new_df.select_dtypes(exclude=['int64','float64']).columns.tolist()] = new_df.select_dtypes(exclude=['int64','float64']).fillna(most_common)

#

Got any idea how to fetch the most common value for each column without iterating through it?

woven pumice Mar 23, 2021, 9:59 PM

#

you could do something like new_df.mode().iloc[0]

#

I think using an Imputer may make your life easier however: https://scikit-learn.org/0.18/modules/generated/sklearn.preprocessing.Imputer.html

twin moth Mar 23, 2021, 10:03 PM

#

woven pumice you could do something like `new_df.mode().iloc[0]`

Actually I did the following:

def replace_missing_values(df, col_to_def_val_dict):
    new_df = df.copy()
    new_df.fillna(col_to_def_val_dict, inplace=True)
    
    new_df[new_df.select_dtypes(exclude=['int64','float64']).columns.tolist()] = new_df.select_dtypes(exclude=['int64','float64']).fillna(new_df.mode())
    new_df[new_df.select_dtypes(include=['int64','float64']).columns.tolist()] = new_df.select_dtypes(include=['int64','float64']).fillna(new_df.median())
    return new_df

#

Doesn't always work though

#

Some of those values stays NaN

#

If I replace new_df.mode() with a single string it "works"

#

Otherwise it just stays the same

exotic maple Mar 23, 2021, 10:21 PM

#

vestal bough I was wandering about using Kmeans clustering for multidimensial dataset. Being ...

I dont think there's a way to verify

#

by definiation Kmeans is an unsupervised / descriptive model. and Kmeans requires predetermining the amount oif clusters you want to use

#

If you have no idea of how many you have perhaps try using DBSCAN?

twin moth Mar 23, 2021, 10:25 PM

#

twin moth Actually I did the following: ```py def replace_missing_values(df, col_to_def_v...

YAS!

def replace_missing_values(df, col_to_def_val_dict):
    new_df = df.copy()
    
    new_df.fillna(col_to_def_val_dict, inplace=True)
    
    c_df = new_df.select_dtypes(exclude=['int64','float64'])
    new_df[c_df.columns.tolist()] = c_df.fillna(c_df.mode().iloc[0])
    
    c_df = new_df.select_dtypes(include=['int64','float64'])
    new_df[c_df.columns.tolist()] = c_df.fillna(c_df.median())
    
    return new_df

#

Now it works!

strong zephyr Mar 23, 2021, 10:30 PM

#

Nothing advanced, but for those interested in data / cicd pipelines
https://medium.com/analytics-vidhya/creating-a-data-pipeline-with-easyjobs-fastapi-4e302556f05d

Medium

Creating a Data Pipeline with EasyJobs & FastAPI

FastAPI is quickly making a name for itself in the python community for its ease of use in developing RestAPI’s for nearly anything.

uncut barn Mar 23, 2021, 10:33 PM

#

class encoder(nn.Module):
    def __init__(self, n_inputs = 40):
        super(encoder, self).__init__()
        self.n_inputs = n_inputs
        self.N_c = torch.randint(1, n_inputs + 1, (1,)).item()
        self.random_indices = torch.randperm(self.n_inputs)[:self.N_c]

        self.fc_enc1 = nn.Linear(2, 64)
        self.fc_enc2 = nn.Linear(64, 32)
        self.fc_enc3 = nn.Linear(32, 2)

        torch.nn.init.normal_(self.fc_enc1.weight, std=0.01)
        torch.nn.init.zeros_(self.fc_enc1.bias)
        torch.nn.init.normal_(self.fc_enc2.weight, std=0.01)
        torch.nn.init.zeros_(self.fc_enc2.bias)
        torch.nn.init.normal_(self.fc_enc3.weight, std=0.01)
        torch.nn.init.zeros_(self.fc_enc3.bias)

    def forward(self, X, y):
      x_c, y_c = X[:, self.random_indices], y[:, self.random_indices]
      input = torch.cat((x_c, y_c), 2)
      h1_enc_output = F.relu(self.fc_enc1(input))
      h2_enc_output = F.relu(self.fc_enc2(h1_enc_output))
      r_c = F.relu(self.fc_enc3(h2_enc_output))
      return r_c

would this be a possible way for a model to accept an arbitrary number of inputs?

exotic maple Mar 23, 2021, 10:36 PM

#

strong zephyr Nothing advanced, but for those interested in data / cicd pipelines https://me...

omg tools hell. I have so many useful tools lmao

#

though, not interested in pipelines...atm

woven pumice Mar 23, 2021, 10:39 PM

#

twin moth Now it works!

After some searching around this also seems to work

import pandas as pd
df = pd.DataFrame({'a': [1] * 3 + [2] * 3 + [np.NaN] * 2,
                   'b': [True, True, True, True, True, False, np.NaN, np.NaN],
                   'c': [1.0, 2.0, 3.0, np.NaN, np.NaN, 6.0, 7.0, 8.0]})
print(df.head(10))
df['a'].fillna(df['a'].mean(), inplace=True)
df['b'].fillna(df['b'].mode().iloc[0], inplace=True)
df['c'].fillna(df['c'].median(), inplace=True)
print(df.head(10))```

strong zephyr Mar 23, 2021, 10:41 PM

#

@exotic maple pipelines are just one possibility with easy jobs, but next closet parallel is celery 🙂

twin moth Mar 23, 2021, 10:48 PM

#

woven pumice After some searching around this also seems to work ```import numpy as np impor...

That's basically what I did

#

But I replaced a bunch of columns in each operation while you only did one at a time

woven pumice Mar 23, 2021, 10:55 PM

#

Ah, I see. Seems like a good approach

exotic maple Mar 23, 2021, 11:01 PM

#

that's a not DS question, but are you looping through those images?

grave frost Mar 23, 2021, 11:11 PM

#

!code

arctic wedgeBOT Mar 23, 2021, 11:11 PM

#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

exotic maple Mar 23, 2021, 11:12 PM

#

well, While does not create an I value

#

you're better off using a for loop

#

but if you insist on a while

#

you can do something like

#

While i

#

While i < len(df)
DO SOMETHING HERE
i += 1

analog cave Mar 23, 2021, 11:14 PM

#

does df stand for dataframe?

exotic maple Mar 23, 2021, 11:14 PM

#

yes

#

its the standard short version

serene scaffold Mar 23, 2021, 11:21 PM

#

exotic maple ```py While i < len(df) DO SOMETHING HERE i += 1 ```

Surely there's a better way?

exotic maple Mar 23, 2021, 11:22 PM

#

Ofx there ks but im playing minecraft with my daughter 😂 cant think right now

serene scaffold Mar 23, 2021, 11:22 PM

#

Cute

twin moth Mar 23, 2021, 11:25 PM

#

woven pumice Ah, I see. Seems like a good approach

Thanks 🙂

grave frost Mar 24, 2021, 12:04 AM

#

exotic maple Ofx there ks but im playing minecraft with my daughter 😂 cant think right now

lemon_surprised how old are you?

exotic maple Mar 24, 2021, 12:05 AM

#

28

grave frost Mar 24, 2021, 12:07 AM

#

I thought you were somewhere around 22-23 😂

velvet thorn Mar 24, 2021, 12:08 AM

#

woven pumice After some searching around this also seems to work ```import numpy as np impor...

you can pass a dict to fillna to make it more efficient

dark sonnet Mar 24, 2021, 1:10 AM

#

hi

serene scaffold Mar 24, 2021, 2:06 AM

#

@dark sonnet hi. Do you wanna talk about data science?

gray arch Mar 24, 2021, 2:33 AM

#

velvet thorn you can pass a `dict` to `fillna` to make it more efficient

this seems to be a norm haha

exotic maple Mar 24, 2021, 2:45 AM

#

grave frost I thought you were somewhere around 22-23 😂

idk why you thought that

#

growing old doesnt mean you need to be grumpy :v

fleet hare Mar 24, 2021, 3:09 AM

#

Anyone have suggestions on repo structure/templates for ml projects? Specifically a project with heavy experimentation but also a deployed production model

exotic maple Mar 24, 2021, 3:10 AM

#

man i still dont get how to upload files to google colab

#

that shit is as cryptic as matplotlib docs...

fleet hare Mar 24, 2021, 3:10 AM

#

Pretty sure it's just drag and drop

serene scaffold Mar 24, 2021, 3:14 AM

#

fleet hare Anyone have suggestions on repo structure/templates for ml projects? Specificall...

so you have a repo that has a pretrained model, but you can also use it to train new models?

exotic maple Mar 24, 2021, 3:21 AM

#

fleet hare Pretty sure it's just drag and drop

on the notebook?

fleet hare Mar 24, 2021, 3:22 AM

#

serene scaffold so you have a repo that has a pretrained model, but you can also use it to train...

Yeah or many different types of models and data preprocessing and everything else that you mess around with while working on a project. Basically what’s the best way to structure a repo to keep track of these experiments but also have a production model.

Looking for something like this: https://github.com/jeremyjordan/data-science-template or this: https://github.com/ml-tooling/ml-project-template but I wanted to see if there was anything else out there

GitHub

jeremyjordan/data-science-template

Contribute to jeremyjordan/data-science-template development by creating an account on GitHub.

GitHub

ml-tooling/ml-project-template

ML project template facilitating both research and production phases. - ml-tooling/ml-project-template

#

The second one seems like a bit of an anti pattern since since it’s basically the same as splitting the research and production into 2 different repos which I don’t want to do

fleet hare Mar 24, 2021, 3:26 AM

#

exotic maple on the notebook?

You have to click the folder icon on the menu on the left side when you’re in a notebook and then you can drag and drop files or there should be an upload button

exotic maple Mar 24, 2021, 3:27 AM

#

@fleet hare I dont know

#

but i willfind you, and i will hug you

#

thanks

#

lmao

serene scaffold Mar 24, 2021, 3:29 AM

#

@fleet hare I'm not sure I understand the issue with having the user-facing code in a separate repository if the research-specific code isn't useful to them.

lapis sequoia Mar 24, 2021, 6:44 AM

#

It's a way of representing documents usually employed in Information Recovery or learning from texts. Each document (observation) is modelled as a vector of N dimensions, being N the number of words, terms or whatever base unit you are working with. If the document contains a given word, then the corresponding element of the vector is not zero. It's a generalization of the Standard Boolean Model, where elements of a vector can only take values 0 or 1.

short heart Mar 24, 2021, 7:16 AM

#

#data-science-and-ml message

#

Link to my recent problem

solemn atlas Mar 24, 2021, 9:23 AM

#

Hello Gentlemen,
Hope you have a very enormous day,
I am new to ai and stuff ,I wnt to write my very first neural network ,just wnt to get started but on yt I cant find the appropriate video, if you guys can suggest some yt video for absolute beginner who knows python programming upto certain extent(not pro though) will be great 😁

grave frost Mar 24, 2021, 9:57 AM

#

exotic maple idk why you thought that

subconsciously 🙂 🤷

short heart Mar 24, 2021, 10:38 AM

#

So im using lstm. Im using 60 values to predict 1 value, append it to these 60 and remove first value, predict again. But my model doesnt seem to be very effective. Is it worth it giving it more data(gonna take long time) or i have to somehow change the model

grave frost Mar 24, 2021, 11:03 AM

#

short heart So im using lstm. Im using 60 values to predict 1 value, append it to these 60 a...

what? are you trying to do time series prediction?

sonic raft Mar 24, 2021, 11:08 AM

#

Hi! I've been struggling to understand why we need nonlinearity for neural networks, why we need to use activation functions.. also for example in case of the famous mnist_dataset where the image sizes are 28*28, we construct the weight matrix with dimensions of (28 * 28, 30), (30 is just an example,) but the point is that it's bigger than one.. why? 😄 What does it look like when an input flow through a neural network with two layers? (28 * 28,30) and the second layer (30,1)
That's my biggest problem I have no idea what it looks like when the pixels of an image(the input data) flow through the network.
(fastai fastbook chapter: https://github.com/fastai/fastbook/blob/master/04_mnist_basics.ipynb)

short heart Mar 24, 2021, 11:16 AM

#

grave frost what? are you trying to do time series prediction?

yeah kinda

winged yew Mar 24, 2021, 11:22 AM

#

heyy anyone

#

who know data science fully

tidal bronze Mar 24, 2021, 11:25 AM

#

can you use silhouette score to compare different kmeans aggregation that use different features?

tidal bough Mar 24, 2021, 11:43 AM

#

sonic raft Hi! I've been struggling to understand why we need nonlinearity for neural netwo...

why we need nonlinearity for neural networks, why we need to use activation functions..
Because it's easy to prove that if you only use linear activation functions, then your linear network, no matter how deep or wide it is, is equivalent to just a linear function from inputs to outputs. And, well, linear function do not useful computation make. A linear function classifiying images as cat or not would have some pixels with positive weights and some with negative ones - make all the former ones pure white, all the former ones pure black, and you'll get a "perfect" cat image as far as the network is concerned.

#

Don't think I understand the rest of your question.

sonic raft Mar 24, 2021, 11:52 AM

#

tidal bough Don't think I understand the rest of your question.

I don't understand either, I just can't imagine what does it look like when our input data "flows" through the network, the futures that it constructs.
Furthermore, as the image shows instead of creating your weights with one column we create 30 columns, maybe that give you some idea what I was trying say.

tidal bough Mar 24, 2021, 11:53 AM

#

This is a network with 28*28 inputs, 30 neurons in the first (and only) hidden layer, and 1 output

#

So the matrix that transforms from first (inputs) to second(first hidden) layer is (28*28) x 30, and the matrix transforming from the first hidden to the outputs is 30 x 1.

sonic raft Mar 24, 2021, 12:04 PM

#

I see, but Why is it good to have more and more neurons? I mean I know that it will make the network deeper and deeper, and it will perform better, but why?

tidal bough Mar 24, 2021, 12:05 PM

#

Well, the more complexity, the more complex relationships the network can approximate.

#

Check out https://neuralnetworksanddeeplearning.com/chap4.html for a visual explanation

short heart Mar 24, 2021, 12:06 PM

#

can someone help me with lstm

sonic raft Mar 24, 2021, 12:09 PM

#

tidal bough Check out https://neuralnetworksanddeeplearning.com/chap4.html for a visual expl...

I've seen many visual explanation, but the thing I don't understand is the relationship between the layers, I thought they do the same things but I often hear that they have objectives, guess because of nonlinearity, but can't see the whole picture

tidal bough Mar 24, 2021, 12:11 PM

#

If your layers are just dense like here, there's no meaningful "purpose" of each layer. They just do some stuff that, after training, ends up being involved somehow in calculating the result.

#

If the layers are different, like how in image classification neural networks the first few layers usually do convolutions and stuff, then you can say that the first few ones do stuff like search for lines, then for angles and more complex details - but even that's mostly a guess.

#

Generally speaking, neural networks just work - their training adjusts the flow between each layer so that the whole ends up doing the task you're making it do. There's no guarantee you can describe the purpose of any specific layer.

#

You can try searching for research on that matter though, maybe there are papers about trying to determine the function of parts of trained neural networks.

sonic raft Mar 24, 2021, 12:16 PM

#

because I constantly think that the whole matrix multiplication it does can be done by just simply one layer, because they basically multiply the layers with weights and weights, but I guess I can describe why this work like that with Nonlinearity, like ReLU

#

😄

tidal bough Mar 24, 2021, 12:16 PM

#

because I constantly think that the whole matrix multiplication it does can be done by just simply one layer, because they basically multiply the layers with weights and weights
yup, this is precisely why activation functions are needed - without them (or if they were linear), like I said, the entire network can be collapsed into just one layer mapping from inputs to outputs

sonic raft Mar 24, 2021, 12:17 PM

#

tidal bough Generally speaking, neural networks just work - their training adjusts the flow ...

Yes, I see but sometimes ReLU just transforms the negative weights to zero, making its gradient zero too

tidal bough Mar 24, 2021, 12:19 PM

#

yeah, it's pretty weird, and yet ReLU is newer to become popular - stuff like logistic and tanh are the older ones. Apparently ReLU was shown to be better, and I don't think I know enough to understand why.

sonic raft Mar 24, 2021, 12:19 PM

#

Yes, I guess it just trying to push parameters that are important to be positive and less importants to be closer to negative 😄

sonic raft Mar 24, 2021, 12:21 PM

#

tidal bough yeah, it's pretty weird, and yet ReLU is *newer* to become popular - stuff like ...

Anyways, thank you for your kind explanation I think I get the whole picture now.

quiet dawn Mar 24, 2021, 12:48 PM

#

I have a question

#

what is the best source for learning pytorch

#

but it should be beginner level

lapis sequoia Mar 24, 2021, 1:06 PM

#

I think official web side is a good place to start https://pytorch.org/tutorials/index.html

quiet dawn Mar 24, 2021, 1:09 PM

#

i didn't check completely but there isn't math side of ai

#

probably