#data-science-and-ml | Python | Page 220

uncut shadow Mar 23, 2020, 10:33 AM

#

well

#

that's how it works

#

I mean

velvet thorn Mar 23, 2020, 10:42 AM

#

@silk forge are you Indian?

#

your y should be 1D, not 2D, in this case

silk forge Mar 23, 2020, 10:46 AM

#

is that an indian thing

#

anyway yeah

#

1d

velvet thorn Mar 23, 2020, 10:47 AM

#

"doubt" to mean "question" is a very Indian thing

#

along with "the same"

#

y = data[['CO2EMISSIONS']] will give you 2D

silk forge Mar 23, 2020, 10:47 AM

#

yeah i got that

#

supposed to be 1d

#

@velvet thorn you must have a shitload of experience with indians i suppose?

velvet thorn Mar 23, 2020, 10:49 AM

#

I'm from a country with a sizable Indian majority, and I've worked in a startup with like 80% Indians

#

there are distinct speech pattern differences between Indians in my country and Indians from India

silk forge Mar 23, 2020, 10:49 AM

#

uhm what country exactly

velvet thorn Mar 23, 2020, 10:49 AM

#

Singapore

#

but the Indians here are generally 3rd or 4th generation so they don't resemble India Indians that much

silk forge Mar 23, 2020, 10:51 AM

#

oh

#

and yeah another small "doubt"

📎 unknown.png

#

isnt this the reason why is x is represented as x = data[['ENGINESIZE']]

velvet thorn Mar 23, 2020, 10:53 AM

#

x is fine

#

x should be 2D

#

rows are samples, columns are features

silk forge Mar 23, 2020, 10:53 AM

#

isnt this the reason though

velvet thorn Mar 23, 2020, 10:54 AM

#

uh

#

not just that

#

okay, mayeb you could explain what you mean by that image

#

because we should be more or less saying the same thing

silk forge Mar 23, 2020, 10:56 AM

#

you sure about y being 1D?

#

i cant understand that part still

velvet thorn Mar 23, 2020, 10:57 AM

#

okay I'm currenlty doing something

#

I'll get back to you

#

in an hour or os

silk forge Mar 23, 2020, 11:14 AM

#

Andrew NG didnt say anything about this 1d and 2d stuff

velvet thorn Mar 23, 2020, 11:32 AM

#

okay

#

so, basically

#

the standard way of storing data is as a 2D array

#

where each row (1st axis) represents a sample and each column (2nd axis) represents a type of observations

#

therefore, X should always be 2D.

#

in some cases, you may have only one sample, or only one type of observation (feature).

#

but that doesn't make your data 1D

#

it just means that one dimension is 1

#

now, for y, assuming you're only making a prediction on one variable, it should be 1D

#

because it's basically another type of observation, except it's the target

silk forge Mar 23, 2020, 1:05 PM

#

@velvet thorn would that be the same case for multivariate linear rgeression?

velvet thorn Mar 23, 2020, 1:06 PM

#

yes, sklearn treats simple and multivariate linear regression similarly

#

in both cases X is 2D

#

just that in SLR its shape is (N, 1), where N is the number of samples

#

okay, wait, I should clarify

#

if you mean "multivariate" in the proper sense (multiple dependent variables) then, yes, y will be 2D

#

but it is common to say "multivariate" to mean "multiple" (which is, strictly speaking, wrong) in the sense of multiple independent variables

#

in the case above, you passed a 2D array for y, which is why you got a 2D array for your coefficients

#

because it's one 1D array for each dependent variable

lapis sequoia Mar 23, 2020, 1:52 PM

#

Hi All, does any of you have code for a numpy based CNN to share for own pictures? (~480x360 pixel input)

acoustic forge Mar 23, 2020, 2:48 PM

#

Does anyone have an updated Data Science roadmap/long-term tutorial one could follow(Including maths and programming etc)? I am currently finishing my bachelors degree, and would like to practice Data Science at the same time

#

Ping me if someone has something like this :))

floral mantle Mar 23, 2020, 3:33 PM

#

trying to learn how to code by transforming the JHU COVID-19 data into a new df normalizing all countries to the day they hit 100+ cases
and I'm really struggling with some basic groupby stuff to shift the JHU data from state-level to country-level, it's blanking out my df

#

posted some stuff in #help-chestnut then realized that this is likely more geared to analytics arena of python

floral mantle Mar 23, 2020, 5:15 PM

#

specifically, how do I fix this? I want the output to be a grouped table at the country level, by day

#

import pandas as pd
import numpy as np

#this should link to the raw CSV of the latest time series data
url = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv"
df = pd.read_csv(url)

#unpivot data
df = pd.melt(df, id_vars=['Province/State','Country/Region','Lat','Long'], var_name='date', value_name='Confirmed Cases')
df.to_csv('Working File.csv',index=True)

#create flag and custom fields
#df["date"] = pd.to_datetime(df["date"], format='%m/%d/%YY', errors='ignore')
#df["Confirmed Cases"] = pd.to_numeric(df["Confirmed Cases"])
df["Flag"] = np.nan
df["DayZeroIndex"] = np.nan

df = df.groupby(['Country/Region','date','DayZeroIndex','Flag']).agg({'Confirmed Cases': 'sum'}) #******ERROR IS HERE******
df = df.sort_values(by=['Country/Region','Confirmed Cases'], ascending=True)
df.to_csv('Working File2.csv',index=True)

#

Flagged the line that I think the error is on

lapis sequoia Mar 23, 2020, 5:36 PM

#

Is there anyone with object tracking experience in python that is willing to teach me or knows a good way to learn it?

willow karma Mar 23, 2020, 5:37 PM

#

Hey @floral mantle so when you actually perform your melt, you have your confirmed cases cohorted by country, but it looks cumulative totals aren't being calculated

floral mantle Mar 23, 2020, 5:40 PM

#

they're split by country & state & date
New York US 1/1/20 20
Washington US 1/1/20 10

#

and I'm wanting to group it on country
US 1/1/20 30

#

then, in a really poor way most likely, I'll add in a couple of custom fields to do an indexed plot

#

Alternatively the data is already compiled the right way at https://ourworldindata.org/grapher/covid-confirmed-cases-since-100th-case?time=0..62 if I could figure out how to link the csv in the data tab into my python df

Our World in Data

Total confirmed cases of COVID-19

The starting point for each country is the day that country had reached 100 confirmed cases
This allows us to compare the trajectory of confirmed cases between countries.
The number of confirmed cases is lower than the number of total cases. The main reason for this is limited...

willow karma Mar 23, 2020, 5:46 PM

#

@floral mantle if you dont see a way to do this easily, you can definitely run some type of BeautifulSoup/Selenium job to download this information on some recurring basis

floral mantle Mar 23, 2020, 5:46 PM

#

yeah only challenge that I'm seeing with downloading the data is that they serve it as a blob:http:// and I don't know how to read that in

willow karma Mar 23, 2020, 5:53 PM

#

So @floral mantle - this first StackOverflow post (https://stackoverflow.com/questions/48404681/python-how-to-download-csv-files-using-selenium) and this other one (https://stackoverflow.com/questions/22676/how-do-i-download-a-file-over-http-using-python) should get you pretty far

Stack Overflow

Python: how to download .csv files using Selenium?

I am using Selenium for navigating the following website:

https://apps1.eere.energy.gov/sled/#/
I would like to have data for a city like Boston: what I am doing is the following:

from selenium ...

Stack Overflow

How do I download a file over HTTP using Python?

I have a small utility that I use to download a MP3 from a website on a schedule and then builds/updates a podcast XML file which I've obviously added to iTunes.

The text processing that creates/u...

icy ginkgo Mar 23, 2020, 8:31 PM

#

is there a discord server strictly for jupyter?

somber hamlet Mar 23, 2020, 8:59 PM

#

Afaik no

real wigeon Mar 24, 2020, 2:14 AM

#

so I have to come up with a sorting algorithm

#

and a way to standardize string inputs

#

not sure where to start

#

so like for example im taking items from vendors and coming up with an algo to name these items so that they can be categorized and searched with ease

frail frigate Mar 24, 2020, 8:12 AM

#

Hello, I have a problem related to matrix multiplication in python... I tried it in C++ and used the IKJ algorithm, times were around 20 seconds for 2000x2000 matrix times another 2000x2000 matrix... the problem is that when I used the exactly same code in python, and used multithreading / multiprocessing, the times got absurdly high, for multithreading, a 2000x2000 times another matrix the same size ran for like 5h and 40 minutes

velvet thorn Mar 24, 2020, 9:15 AM

#

@frail frigate what data structure are you using?

#

Python lists?

frail frigate Mar 24, 2020, 9:16 AM

#

2D arrays with numpy

#

if that's what you meant @velvet thorn , sorry if not, I'm completely new with using python

velvet thorn Mar 24, 2020, 9:38 AM

#

yup, that's what I meant

#

what do you mean "same code"?

#

that seems a bit long...

frail frigate Mar 24, 2020, 9:42 AM

#

same code, meaning same algorithm, used IKJ algorithm on both

#

here's the code snippet

#


def multiplicationThreading(threadAmount, size):
    dividedAmountThreads = (int)(size / threadAmount)

    threads_list = []
    count = 0
    for thread in range(threadAmount):
        new_thread = threading.Thread(name = thread + 1, target = multiplicationParallel, args=(count, dividedAmountThreads, size,))
        threads_list.append(new_thread)
        count += 1

    start_time = time.time()
    print('Start parallel execution with',threadAmount,'threads for matrixes',size,'x',size)

    for thread in threads_list:
        thread.start()

    for thread in threads_list:
        thread.join()

    print('Execution time:', time.time() - start_time,'seconds')

def multiplicationParallel(threadCount, dividedAmountThreads, size):

    random_matrix_a = numpy.random.randint(0, 1000,(size, size))
    random_matrix_b = numpy.random.randint(0, 1000,(size, size))

    blank_matrix = numpy.zeros(shape=(size, size), dtype = int)

    for i in range((dividedAmountThreads*threadCount), dividedAmountThreads*(threadCount+1)):
        for k in range(size):
            for j in range(size):
                blank_matrix[i][j] += (random_matrix_a[i][k] * random_matrix_b[k][j])

In main method calling the thread function like:


for size in sizes:
        for thread in threadAmount:
            now = datetime.now()
            current_time = now.strftime("%H:%M:%S")
            print("Current Time =", current_time) 
            multiplicationThreading(thread, size)

@velvet thorn

uncut shadow Mar 24, 2020, 10:23 AM

#

Hey. I have a pure theoretical question about RNNs and stuff like that. Let's say I have a long sequence of words

How can I turn them into numbers?
When I'll turn them into numbers and make network process it, how can I turn them back into words or letters?

oblique belfry Mar 24, 2020, 12:11 PM

#

@uncut shadow You will need to turn the words into a vector. The famous algorithm word2vec does this. There has been a lot of advancement in this space as of late, so you should def do more research. But the general premise is getting text -> clean text -> remove stop words -> lemmatization -> vectorization -> feed into model.

The model will output another vector, and this vector corresponds with the vectorization process. If you turned the word into a vector, the same idea can be used to turn the vector into words.

NLP isn't my expertise, but hope this at least gives you a head start.

uncut shadow Mar 24, 2020, 12:36 PM

#

thanks! I'll try that

oblique belfry Mar 24, 2020, 12:40 PM

#

nltk and spacy have good docs and should help you out.

obsidian copper Mar 24, 2020, 4:25 PM

#

any good sources for learning real time video classification??

#

using CNN+LSTM perhaps

lapis sequoia Mar 24, 2020, 4:41 PM

#

I just want to start learning about data analysis or data scientist stuff. can u guys suggest me the best place to start?

uncut shadow Mar 24, 2020, 4:51 PM

#

@lapis sequoia check DataCamp

#

@lapis sequoia first chapters of courses are free but then you will have to pay (actually, there is other way around -> https://www.quora.com/How-do-I-access-DataCamp-courses-for-free)

lapis sequoia Mar 24, 2020, 5:03 PM

#

Thank you @uncut shadow

uncut shadow Mar 24, 2020, 5:03 PM

#

👍

oblique belfry Mar 24, 2020, 5:06 PM

#

@obsidian copper What are you looking for? Are you wanting to classify an entire video, or are you wanting to classify objects in a video?

obsidian copper Mar 24, 2020, 5:31 PM

#

@oblique belfry it's the same hand gestures thingy dude

oblique belfry Mar 24, 2020, 5:31 PM

#

https://github.com/jinwchoi/awesome-action-recognition

GitHub

jinwchoi/awesome-action-recognition

A curated list of action recognition and related area resources - jinwchoi/awesome-action-recognition

floral mantle Mar 24, 2020, 9:02 PM

#

df["DayZeroIndex"] = pd.to_numeric(df["DayZeroIndex"], downcast='integer')

#

Trying to convert that field in my dataframe to drop the decimals

#

Right now it shows 0.0, 1.0, 2.0, 3.0 -- I just want 0, 1, 2, 3

#

How do I do that?

silent swan Mar 25, 2020, 2:12 AM

#

I would just do .astype(int)

#

depends on what you're trying to know about DL

#

presumably it'd be just setting the derivative of the log likelihood to zero

#

but typically deep learning models are non-convex so there shouldn't be a unique global minimum

#

(or rather maximum, in terms of likelihood)

#

aha

#

yup

#

err not particularly, other than the Deep Learning book most of the content for DL models are either in papers or lectures notes of very new classes

#

although books like Murphy will always be relevant

frail flower Mar 25, 2020, 2:43 AM

#

Trying to come up with a minimal environment.yml to use as my default setup in the future. Here's what I have so far:

name: minimal
channels:
        - conda-forge
dependencies:
        - python=3.7
        - pandas
        - scikit-learn
        - matplotlib
        - jupyterlab

#

Any other suggestions for the bare minimum needed for the majority of projects?

silent swan Mar 25, 2020, 2:59 AM

#

tqdm/seaborn are nice but not necessary

frail flower Mar 25, 2020, 3:00 AM

#

I've used seaborn a lot (great for violin plots), but what's tqdm?

silent swan Mar 25, 2020, 3:00 AM

#

progress bar

frail flower Mar 25, 2020, 3:01 AM

#

neat

#

Oh, as in the one that's in the pandarallel demo?

silent swan Mar 25, 2020, 3:02 AM

#

not sure about that, but you can see it here: https://github.com/tqdm/tqdm

GitHub

tqdm/tqdm

A Fast, Extensible Progress Bar for Python and CLI - tqdm/tqdm

frail flower Mar 25, 2020, 3:02 AM

#

https://github.com/nalepae/pandarallel this is probably the best extension to pandas i've seen yet

GitHub

nalepae/pandarallel

A simple and efficient tool to parallelize Pandas operations on all available CPUs - nalepae/pandarallel

silent swan Mar 25, 2020, 3:03 AM

#

cool, I'd not heard of it

frail flower Mar 25, 2020, 3:03 AM

#

Especially when you're dealing with something huge like ERA5.

floral mantle Mar 25, 2020, 12:22 PM

#

hey guys - dataframe filter question

#

maybe I have to get it another way though

#

I have a process that updates all COVID-19 cases, by country, by day from the JHU github source.
It normalizes the data to a DayZeroIndex for each country where that is the day each country hit >= 100 cases.
I realized that I need to cut off the last DayZeroIndex for each country since the data isn't finalized until the next morning.
I'm using the filter below to remove anything where DayZeroIndex = -1 (meaning <= 100 cases for the country).
What do I use?

#

df = df[df["DayZeroIndex"] != -1]

#

@silent swan thank you for the astype(int) suggestion. Will give it a shot. Honestly, I was shotgun approaching the whole thing since I kept getting errors. I think the reason I was having so much trouble is that my DayZeroIndex field originally was set to np.NaN and astype(int) doesn't handle that well so it stayed float64. I changed the default value to -1 though, so maybe it works now

**Update: Worked like a charm and that's a lot cleaner than the to_numeric/downcast solution. pydistrong **

dusky cairn Mar 25, 2020, 4:34 PM

#

Can anyone help me doing a polynomial re?gression

uncut shadow Mar 25, 2020, 6:37 PM

#

Hey. I have a question about RNNs. I have seen many times about RNN cell and I'm wondering, aren't those layers? Or maybe RNNs and LSTMs are cells? I mean, I have heard that amount of cells has to be equal the amount of single time-steps

silent swan Mar 25, 2020, 7:57 PM

#

@uncut shadow there's two dimensions to consider

#

how many timesteps you're going to process the inputs for (using the same cell each time)

uncut shadow Mar 25, 2020, 7:58 PM

#

hmm

silent swan Mar 25, 2020, 7:58 PM

#

and how many layers of RNN cells/LSTMs you have (these are usually different)

uncut shadow Mar 25, 2020, 7:58 PM

#

so

#

what about the first one

silent swan Mar 25, 2020, 7:59 PM

#

what about the first one

#

@bowy could you elaborate what your confusion is? the math gets pretty yucky because you have to sum the gradients over the different spatial locations, so some articles may simplify it

uncut shadow Mar 25, 2020, 8:02 PM

#

well, I have another question. In Dense NNs I could use different activation functions which I could choose. In RNN or LSTM I see there is tanh, sigmoid and softmax. Does it mean, I can only use those 3?

silent swan Mar 25, 2020, 8:03 PM

#

within the LSTMs, there're specific configurations of activation functions. don't change those

uncut shadow Mar 25, 2020, 8:03 PM

#

hmm

#

Ok, thanks

silent swan Mar 25, 2020, 8:03 PM

#

otoh, almost no one uses vanilla RNNs

uncut shadow Mar 25, 2020, 8:03 PM

#

ohh

#

cuz of the vanishing gradient?

#

also, if I have a sequence 110011001100... and I want network to predict next 4 numbers how many cells should I use?

quartz cedar Mar 25, 2020, 9:22 PM

#

hello guys

#

is anyone here familiar with the matplotlib library?

worn stratus Mar 25, 2020, 9:38 PM

#

People will be. Just ask your question and someone will respond if they know

quartz cedar Mar 25, 2020, 9:43 PM

#

okay so i've been having a problem with the matplotlib library

#

I'm trying to create a barchart from the list that i have

#

but for some odd reason it won't plot both of them properly on the graph

#

I'll send you guys the code

#

one sec

worn stratus Mar 25, 2020, 9:45 PM

#

What does the result come out like and what do you expect it to come out like?

uncut shadow Mar 25, 2020, 9:46 PM

#

which is better?

Enumerating and giving every symbol in sentence an unique number?
Using one-hot encoding?

quartz cedar Mar 25, 2020, 9:46 PM

#

i get these 2 results

#

but the problem is

#

the Free paid games = 1087200000

#

and the total paid games = 900000

#

but for some reason the paid bar is incorrect, it's even supposed to be a different colour as you can see

#

so i don't know what to do

#

I'm very lost

#

@worn stratus any ideas?

#

I'm desperate lol

worn stratus Mar 25, 2020, 9:53 PM

#

I'm not really an expert at all with matplotlib

#

so I can't help much

quartz cedar Mar 25, 2020, 9:54 PM

#

😦 oh okay

#

waits patiently

quartz cedar Mar 25, 2020, 11:17 PM

#

is anyone here?

#

i srsly need some help in this

worn stratus Mar 25, 2020, 11:20 PM

#

The best way to get help is to have a concise summary of your problem, and a link to your code as your most recent message - then just lots of patience

quartz cedar Mar 25, 2020, 11:25 PM

#

sighhhhh

#

okey okey

#

i kinda need help now tho

polar acorn Mar 25, 2020, 11:33 PM

#

How about something like this for the plotting?

plt.bar(x, [free_sum, paid_sum], color=['b', 'r'], label=["FREE", "PAID"])
plt.ylabel("Scores")
plt.title('Total Free sum vs Total Paid Sum')
plt.xticks(x)
plt.show()

You'll notice than one column doesn't actually show up, but that is only because with the values you gave it's too small, set them to similar values to see the plot as it's supposed to look like.

quartz cedar Mar 25, 2020, 11:40 PM

#

u see the problem is

#

i can't set them to similar values

#

they have to be the values that i have

#

@polar acorn which is "1087200000" and "900000"

#

i tried to change it to this

#




labels = ['Free','Paid']
x = np.arange(n_groups)
bar_width = 0.25


fig, ax = plt.subplots()
rects1 = ax.bar(index,free_sum,bar_width, color='b', label='FREE')
rects2 = ax.bar(index + bar_width+0.2, paid_sum, color='r', label='PAID')

ax.set_title('Total Free sum vs Total Paid Sum')
ax.set_xticks(index)
ax.set_xticklabels(labels)
plt.show()

#

but now only one bar shows up

#

📎 unknown.png

#

I know that the free bar is correct

#

but the paid bar just doesn't appear

polar acorn Mar 25, 2020, 11:43 PM

#

It's too small to show. Look at the numbers one of them is over a thousand times larger then the other.

quartz cedar Mar 25, 2020, 11:44 PM

#

why is the number on the y axis small

#

and how do you fix that?

#

because the question tells me to get the sum of all the installs of the free aps and the paid apps

#

i got the sum for both

#

but i'm struggling to plot them

polar acorn Mar 25, 2020, 11:45 PM

#

If you use the code I pasted it will plot them correctly. One of the columns will not be visible but that is correct the number of free games is so much bigger that the other column will simply not be visible.

quartz cedar Mar 25, 2020, 11:47 PM

#

i'm not sure

#

because my lecturer somehow managed to plot them

#

she gave me this code

#

but for some reason for me it doesn't work

#


fig, ax = plt.subplots()



index = np.arange(n_groups)

bar_width = 0.25



rects1 = ax.bar(index, Freecount, bar_width,

                alpha=opacity, color='b',error_kw=error_config,

                label='FREE')

#this sets up the second bar in the chart the first element decides were to display this bar and it is set to the index+ the bar_width ( which is the width of the first bar in the chart) and 0.2 for the space in between.

rects2 = ax.bar(index + bar_width+0.2, costCount, bar_width,

                alpha=opacity, color='r',

                error_kw=error_config,

                label='PAID')```

#

this is what she wrote

polar acorn Mar 25, 2020, 11:51 PM

#

That code works fine as well. The problem is not with the plotting code I gave you or the one she gave you. The problem is that you are plotting one column that is 1000 times as tall as the other one so the second column isn't visible at all

quartz cedar Mar 25, 2020, 11:52 PM

#

so what am i supposed to do?

#

just leave it invisible?

#

because when i saw her answer

#

there was 2 visible bars

#

so idk

polar acorn Mar 25, 2020, 11:53 PM

#

That means her numbers are different from yours, maybe you made some error in coming up with those numbers?

#

Maybe you are using different data?

quartz cedar Mar 25, 2020, 11:54 PM

#

nah man

#

it's the same dataset

polar acorn Mar 25, 2020, 11:55 PM

#

If you still want to plot the numbers you have you can change the y axis to be logarithmic such as I have done here:

plt.bar(x, [free_sum, paid_sum], color=['b', 'r'], label=["FREE", "PAID"])
plt.semilogy()
plt.ylabel("Scores")
plt.title('Total Free sum vs Total Paid Sum')
plt.xticks(x, ["FREE", "PAID"])
plt.show()

arctic wedgeBOT Mar 26, 2020, 12:00 AM

#

Hey @quartz cedar!

It looks like you tried to attach file type(s) that we do not allow (.csv). We currently allow the following file types: .3gp, .3g2, .avi, .bmp, .gif, .h264, .jpg, .jpeg, .m4v, .mkv, .mov, .mp4, .mpeg, .mpg, .png, .tiff, .wmv, .svg, .psd, .ai, .aep, .xcf, .mp3, .wav, .ogg, .md.

Feel free to ask in #community-meta if you think this is a mistake.

quartz cedar Mar 26, 2020, 12:13 AM

#

is it not possible to separate them?

#

like instead of having them like one on top of eachother

#

@polar acorn

polar acorn Mar 26, 2020, 12:14 AM

#

To separate what? The columns?

quartz cedar Mar 26, 2020, 12:16 AM

#

yes

#

just to make it look like a normal bar chart

polar acorn Mar 26, 2020, 12:21 AM

#

It is a normal bar chart. There is nothing wrong with the plotting code that I and your supervisor gave you. The plotting works just as intended. Try yourself, find a piece of paper and draw a column 5 centimeters tall, next to it draw a column 10 micro meters tall. You won't see the second column there either. Try the code I or your supervisor gave you but replace free_sum with 100 and paid_sum with 90, the plot will look just fine. The plotting is fine, the numbers are wrong.

quartz cedar Mar 26, 2020, 12:22 AM

#

this is really frustrating

#

ya i understand

oblique belfry Mar 26, 2020, 12:25 AM

#

@uncut shadow How big is the sequence?

lapis sequoia Mar 26, 2020, 1:28 AM

#

im trying to write a basic neural network

#

but my loss keeps increasing

#

but im pretty sure my calculations are correct

#

def sigmoid(x):
    global sigmoid
    return 1.0 / (1 + np.exp(-x))

def sig_deriv(x):
    global sigmoid
    return (sigmoid(x) * (1 - sigmoid(x)))


class NeuralNetwork:
    def __init__(self, x, y):
        self.x        = x
        self.y        = y
        self.weight1  = np.random.normal()
        self.weight2  = np.random.normal()
        self.bias1    = np.random.normal()
        self.bias2    = np.random.normal()
        self.output   = np.zeros(self.y.shape)
        self.rate     = 0.1


    def feedforward(self):
        self.neuron1 = sigmoid((self.x * self.weight1) + self.bias1)
        self.output  = sigmoid((self.neuron1 * self.weight2) + self.bias2)


    def backprop(self):
        dloss_dy  = -2 * (1 - self.output)
        dout_dn1  = self.weight2 * sig_deriv((self.weight2 * self.neuron1) + self.bias1)
        dn1_dw1   = self.x * sig_deriv((self.weight1 * self.x) + self.bias1)
        dout_dw2  = self.neuron1 * sig_deriv((self.weight2 * self.neuron1) + self.bias2)
        dn1_db1   = sig_deriv((self.weight1 * self.x) + self.bias1)
        dout_db2  = sig_deriv((self.weight2 * self.neuron1) + self.bias2)

        self.weight1 -= self.rate * dloss_dy * dout_dn1 * dn1_dw1
        self.weight2 -= self.rate * dloss_dy * dout_dw2
        self.bias1   -= self.rate * dloss_dy * dout_dn1 * dn1_db1
        self.bias2   -= self.rate * dloss_dy * dout_db2

#

i only have one input and one hidden layer

#

apologies if it kinda messy

final anvil Mar 26, 2020, 1:32 AM

#

how would u make a basic neural network in python and for that do i need to know anything higher than algebra

oblique belfry Mar 26, 2020, 2:26 AM

#

You will need to know basic calculus. Gradient descent is key.

kindred stirrup Mar 26, 2020, 3:26 AM

#

Hey all. Anyone familiar with how to add external regressors to AutoARIMA? I’m trying to forecast a series where the holidays change dates like Ramadan or Lunar New Year. Any ideas would be appreciated

polar acorn Mar 26, 2020, 7:53 AM

#

I don't know if you can add external regressors to AutoARIMA, if you have many of them you could do a multivariate linear regression and then model the errors with an ARIMA model. Or you could try out fbprophet as that is a nice library that easy includes external regressors such as moving holidays.

glad roost Mar 26, 2020, 11:18 AM

#

Hi all, I am trying to predict the results of football matches with Poisson regression. How can I improve my accuracy? (I have %40-50 accuracy right now)

Link for the Telegram bot I made if you wanna try (unfortunately it's Turkish): https://web.telegram.org/#/im?p=@MacTahminBot
Code in github: https://github.com/umitkaanusta/MacTahminBotu

GitHub

umitkaanusta/MacTahminBotu

A bot that provides soccer predictions by using Poisson regression. Currently on Telegram - umitkaanusta/MacTahminBotu

Telegram Web

Welcome to the Web application of Telegram messenger. See https://github.com/zhukov/webogram for more info.

polar acorn Mar 26, 2020, 11:44 AM

#

What coefficients are you estimating right now?

glad roost Mar 26, 2020, 11:57 AM

#

I'm trying to calculate attacking and defending "powers" for each team, based on their goals for-goals against statistics. There's also the home advantage for the league

eager heath Mar 26, 2020, 12:30 PM

#

It sounds already like a good accuracy tbh, if you only use the previous results as your input data

polar acorn Mar 26, 2020, 1:35 PM

#

I've used that model before and heard back then that something that is often done is to increase the likelihood of 1-1 draws as they are more likely in real life than in this model. You can check with your data if that's the case and maybe add a small correction.

#

Also a general suggestion I got back then that I never followed up was to treat the attack and defence powers as time series and allow them to change throughout the league, no idea how I would implement that that though 🤷‍♂️

glad roost Mar 26, 2020, 1:55 PM

#

Definitely, the model usually fails to guess draws. Would the use of xG (Expected goals) and xGA (Expected goals against) instead of GF-GA correct the situation with draws?

lapis sequoia Mar 26, 2020, 3:49 PM

#

Hi everyone, I found this channel through a website!
I'm new to data science and learning SQL. Do you guys recommend Coursera or Dataquest?

slow yew Mar 26, 2020, 5:43 PM

#

https://www.youtube.com/watch?v=fYVqck4iZSU

YouTube

Matt Jennings

COVID-19 Stock Market Crisis Visualised

Data visualisation project to show the spread of COVID-19 and it’s impact on the global economy.

Coronavirus cases and death statistics shown with a 2D colour matrix to represent infection and mortality within each country. Economic data from various Major World Indices’ clos...

▶ Play video

ripe forge Mar 26, 2020, 8:00 PM

#

are there any DS based approaches to dealing with object counting across multiple cameras with overlapping FOV (field of view)? It's a tough ask, but if anyone has some resources around this topic, i'd love some recommendation.

lapis sequoia Mar 26, 2020, 9:27 PM

#

hi i'm back my question is what stuff i should learn to automate games like chess

still abyss Mar 26, 2020, 10:37 PM

#

Does anyone know of a good list of hyperparameters for each model type to tune?

oblique belfry Mar 27, 2020, 1:28 AM

#

I am SO glad Jupyter notebooks work in VSCode. I don't have to expose the jupyter port. I can just use the SSH connection I am using for remote connections.

agile anvil Mar 27, 2020, 5:08 AM

#

Hey all, if you know lmfit uncertainties and understand sigmoid curves, would you please have a look at https://repl.it/@jsalsman/COVID19USgrowthExtrapolation and @-me with some ideas for how to prevent the prediction confidence intervals from decreasing, which I suspect means I should not be trying to extrapolate a sigmoid instead of perhaps a (binomial?) time series of non-cumulative occurrences. E.g.:

📎 download.png

repl.it

jsalsman

COVID19USgrowthExtrapolation

U.S. cumulative COVID-19 infections and deaths extrapolations to logistic sigmoid growth curves

#

semi-fixed, still projecting sigmoid cumulatives instead of binomial (poisson?) non-cumulative occurrences

📎 download.png

agile anvil Mar 27, 2020, 6:53 AM

#

the correct model of the non-cumulative observations is a lognormal time series

agile anvil Mar 27, 2020, 7:14 AM

#

📎 Screen_Shot_2020-03-27_at_12.13.54_AM.png

grave lodge Mar 27, 2020, 8:05 AM

#

Hi all! Was wondering if I could get some data-related help... does anyone know what's the best method to aggregate sentences together using Python?
Unfortunately, I won't be able to provide the data as it is confidential but I am working on creating a sort of word/sentence cloud for let's say top different reasons why a process failed

Dataset:

ID | reason
1    | "The app crashed"
2   | "...crashed on it's own"
3   | "User12345 hacked into the system"
4   | "New Patch doesn't work"
5   | "Water damage to device"
6   | "User09876 hacked into the system"

so in this case, we can technically eye which reasons are similar and should be counted together (e.g. "The app crashed" and "...crashed on it's own") or (e.g. Userxyz hacked into the system).
I have already tried splitting the sentence into words and getting the top words (while excluding out the most common words such as "the" or "is") and displayed it as a word cloud, but visually speaking it is not as insightful as I had hoped.

Also, would this require NLP to achieve or not necessarily?

#

https://towardsdatascience.com/word-clouds-in-tableau-quick-easy-e71519cf507a <--- I used this as a reference to create the word cloud viz

Medium

Word Clouds in Tableau: Quick & Easy.

Using Tableau to create word clouds with ease.

proud iron Mar 27, 2020, 8:50 AM

#

hello, i'm working on intrusion detection system currently

#

and i try to apply K-means clustering algorithm using the sklearn library

#

k = 30
km = KMeans(n_clusters = k)
km.fit(features)```

#

when i get to the above stage of the code, specifically at km.fit(features) part, I encounter a MemoryError

#

MemoryError: Unable to allocate array with shape (494021, 38) and data type float64

#

from what i heard from other members of the server, the (494021, 38) array should approximately be less than 1GB of RAM for the computer to handle

#

I definitely do have enough physical memory to handle it

#

is there any possible factors that may influence/cause this?

uncut shadow Mar 27, 2020, 12:15 PM

#

So RNN is whole network or just 1 layer? It's cells pass output from one gate to another cell and the output from the second gate is passed forward, right?

uncut shadow Mar 27, 2020, 12:46 PM

#

and should I mix it with other layers?

ripe forge Mar 27, 2020, 12:48 PM

#

both

#

and also, neither 😛

#

terms in data science mean a lot of related things. if you're talking specifically for a "layer", yeah mix and match whatever layers you like

#

although, if you're using RNN layers, generally lstm are their improved counterpart

#

so you can also be talking about RNN for "architecture", in which case RNN is just the whole network

uncut shadow Mar 27, 2020, 12:53 PM

#

well, I'm wondering cuz in Keras and TF there is Sequential which is (I have seen it many times in many networks) kinda used nearly "everywhere" and it works with dense networks, rnns and many others.

#

I don't know how am I going to create it (from scratch). I created dense layer (forwardpropagation) just by input @ weights + bias and here there also has to be last hidden state added. Does anybody have any ideas about how to make it?

#

Another question, if I'd like to use a sentence which has 3 words in rnn then I'd need 3 cells, right?

#

Another question, let's say that the amount of words in a sentence isn't constant then what should I do? Let's say that I want to make a chatbot and I'll train it on 10 word sentences. What should I do to make sure that network outputs right sentence when user uses the command and inputs sentence with e.g. 20 words?

slow yew Mar 27, 2020, 1:09 PM

#

Data visualization project with maplotlib / cartopy if interested:
https://www.youtube.com/watch?v=fYVqck4iZSU

YouTube

Matt Jennings

COVID-19 Stock Market Crash Visualised

Data visualisation project to show the spread of COVID-19 and it’s impact on the global economy.

Same video without stock markets:
https://youtu.be/P9jhY3U4YRQ

Coronavirus cases and death statistics shown with a 2D colour matrix to represent infection and mortality within ea...

▶ Play video

lapis sequoia Mar 27, 2020, 2:01 PM

#

I have a doubt DQN say for nth state i am getting

qnthvalues = [1,2,3]

So here max q value is selected which 2 pos or the 3 rd value and i am doing the action 3 and getting qn+1thvalue and now should i apply bellman eq for that action or the 3rd value of qn+1th value and leave other value the same for target value

qn+1value = [2,3,4]
targetq_values = [2,3,bellmaneq(4)]
               Or
targetq_values = bellmaneq(qn+1value)

(So for all q values we will be applying or will be applying for the action q value alone.

supple kelp Mar 27, 2020, 5:39 PM

#

im kind of new to programming and i need help with classes

#

so i want to create a calculator in python

#

and

#

i have an overall class called math

#

and in that math class there are names of categories like geometry and calculus

#

and then i want to create a subclass for geometry that includes area and volume

#

and then another class that gives area of square, area of a rectangle area, of a traingle

uncut shadow Mar 27, 2020, 5:42 PM

#

well, that's a good question but probably not in data science channel lol

supple kelp Mar 27, 2020, 5:47 PM

#

but i want to use class method?

#

or should i just do it in computer science

trail grove Mar 28, 2020, 9:19 AM

#

#tools-and-devops @supple kelp here

#

i have a question about sql for data science, which is the most reliable sql language?

#

for data science

feral thistle Mar 28, 2020, 12:01 PM

#

I want to find the rows where the length of items in a column are above a certain value.
The datatype of the column is string.

Looking for something along the lines of the code below

df[df.column.str.length > 1]

shrewd grotto Mar 28, 2020, 12:58 PM

#

i guess this is the right place to ask

final_array = self.eval_values(second_array)
self.map1 = final_array
self.map2 = final_array

For whatever reason any modification i do to self.map2 also happens to self.map1
self.map1 isn't used any where else in the code except for a line self.map2 = self.map1 where i try to reset the modified values in map2 with the original values from map1

#

is this some weird numpy thingie, or am i completely stupid, shouldnt self.map2 = self.map1 overwrite map2 with map1 ?

#

and also why are modifications to one happening to both

#

final_array is a numpy array btw

velvet thorn Mar 28, 2020, 2:37 PM

#

@shrewd grotto Python doesn't make implicit copies

#

you can think of self.map1 and self.map2 as pointing to the same object

shrewd grotto Mar 28, 2020, 2:38 PM

#

@velvet thorn thx

half sand Mar 28, 2020, 3:28 PM

#

Hey Guys! So at Uni we are learning R right now, and I like the language. Nevetheless I wanted to ask:

Do you think for someone who's planning on working in Data Science somewhere, is it better to invest my time in learning Python's statistics modules, or rather learn some more R?

#

I can choose how I do my homework. I could plot with R or with Python whatever. I know Python quite well, but not too much when it comes to statistical work. I don't really know R a lot.

uncut shadow Mar 28, 2020, 4:06 PM

#

@half sand R was made for data science. If you are going to use Python then ok, but knowing R won't do any harm so if you can, yes, you should learn it too

dusk elk Mar 28, 2020, 4:23 PM

#

What's the point of pd.Series.name exactly?

north jay Mar 28, 2020, 7:04 PM

#

I wanted to ask about the data of an image if it were to be opened as a text-file via the notepad application.

#

Appreciate anyone able to explain what I'd be looking at in a somewhat broad sense.

silk frigate Mar 28, 2020, 8:05 PM

#

Can someone tell me how I can get those data at the same 'level/row'? (The indexes don't matter)

📎 unknown.png

tepid thorn Mar 28, 2020, 11:13 PM

#

@silk frigate Is what you screenshotted all the data you are trying to fix?

silk frigate Mar 29, 2020, 12:10 AM

#

@tepid thorn yes

tepid thorn Mar 29, 2020, 12:11 AM

#

You could just hardcode the values into the NaN indexes or you could create a for loop that does that for you

silk frigate Mar 29, 2020, 12:12 AM

#

prob a for loop is better since it works all the time

#

also with more data values

#

but I don't know how to do that in this case 😬

#

I want to make a bar chart like this

📎 sphx_glr_barchart_001.png

#

But if I plot them now they aren't joined

#

(obviously)

tepid thorn Mar 29, 2020, 12:13 AM

#

you could create a for loop thats in the range of the index values that grabs the values you want and another for loop that put the values in the correct spot

#

If that made any sense llol

silk frigate Mar 29, 2020, 12:14 AM

#

Ehhh

#

Well not really, I mean I still wouldn't have any idea how to do that 😂

#

But I was just thinking

#

This is the result of two tables joined together

#

Maybe I can reset the indexes of both dataframes before I combine them>

#

and then they're the same (hopefully)

tepid thorn Mar 29, 2020, 12:16 AM

#

resetting the indexes would just replace the values starting at 0 but it won't move any values

silk frigate Mar 29, 2020, 12:17 AM

#

yeah but before I combine them

#

They start as two different dataframes of 5 values

tepid thorn Mar 29, 2020, 12:18 AM

#

ah thats why you get those NaN values that makes sense

#

What type of join are you doing?

#

Left Join, Right join, Inner, outter?

silk frigate Mar 29, 2020, 12:23 AM

#

Wait I'll have to go now but I think I know how to do it

#

If I have any problems I'll let it know

marsh hull Mar 29, 2020, 12:35 AM

#

Guys, for those of you that build dashboards, how large is your dataset usually?

#

I'm trying to build a dashboard from a game... With enough data that shows all kinds of users etc. The entire dataset it'll return is about 17MB of json

#

What are the tricks for storing, or splitting this up?

agile anvil Mar 29, 2020, 5:12 AM

#

I am finally comfortable with this extrapolation which has the python code and data at bit.ly/covidGrowth

📎 download.png

wary siren Mar 29, 2020, 6:43 AM

#

Hey guys has anyone used mpld3?

#

Would like to convert matplotlib charts to html and send it to a frontend using flask

#

has anyone worked on such a thing?

lapis sequoia Mar 29, 2020, 8:39 AM

#

How can i filter timeseries with the latest month end date ? Incase exact month end is holiday

jolly briar Mar 29, 2020, 12:20 PM

#

@agile anvil looks nice - are you going to set anything up to track how right/wrong your predictions were though?

#

@silk frigate if you post the code you use and example data it would be a lot easier

#

eg

In [19]: d1 = pd.DataFrame(dict(a=[1,2,3]))
In [20]: d2 = pd.DataFrame(dict(b=[1,2,3]))
In [21]: d1
Out[21]:
   a
0  1
1  2
2  3
In [22]: d2
Out[22]:
   b
0  1
1  2
2  3
In [23]: pd.concat([d1, d2])
Out[23]:
     a    b
0  1.0  NaN
1  2.0  NaN
2  3.0  NaN
0  NaN  1.0
1  NaN  2.0
2  NaN  3.0
In [24]: pd.concat([d1, d2], axis=1)
Out[24]:
   a  b
0  1  1
1  2  2
2  3  3

gaunt fiber Mar 29, 2020, 12:57 PM

#

Hi! I'm working in the business controll department of a medium sized company (250 employees) and I've gotten interested in learning data science to broaden my own knowledge and to make our business more data driven. Our ERP is NAV2017 and we use Qliksense as a BI-tool. My initial goal is to analyze our data structure to find flaws, more precisely to find how many rows we have without key dimensions (i.e dimensions such as brand in the OPEX, missing data in the item structure etc). I'm picturing running through the data and creating a visual report of how correct our data is (%) and from that add mandatory fields in the input for those dimensions. How would you recommend me going about this? What should I dive into? I'm thinking about learning Pandas and SQL but I'm not sure if those are the right tools for me. Hope you understand my question. Thanks.

balmy ocean Mar 29, 2020, 2:15 PM

#

Hi! I'm working in the business controll department of a medium sized company (250 employees) and I've gotten interested in learning data science to broaden my own knowledge and to make our business more data driven. Our ERP is NAV2017 and we use Qliksense as a BI-tool. My initial goal is to analyze our data structure to find flaws, more precisely to find how many rows we have without key dimensions (i.e dimensions such as brand in the OPEX, missing data in the item structure etc). I'm picturing running through the data and creating a visual report of how correct our data is (%) and from that add mandatory fields in the input for those dimensions. How would you recommend me going about this? What should I dive into? I'm thinking about learning Pandas and SQL but I'm not sure if those are the right tools for me. Hope you understand my question. Thanks.
@gaunt fiber

https://www.academia.edu/37886932/Data_Analysis_and_Visualization_Using_Python_-_Dr._Ossama_Embarak.pdf

Data Analysis and Visualization Using Python - Dr. Ossama Embarak.pdf

Academia.edu is a platform for academics to share research papers.

gaunt fiber Mar 29, 2020, 2:23 PM

#

@gaunt fiber

https://www.academia.edu/37886932/Data_Analysis_and_Visualization_Using_Python_-_Dr._Ossama_Embarak.pdf
@balmy ocean Thanks - I'll look into that!

Data Analysis and Visualization Using Python - Dr. Ossama Embarak.pdf

Academia.edu is a platform for academics to share research papers.

balmy ocean Mar 29, 2020, 2:25 PM

#

Just read the chapters of the books that get you directly into what you need to solve your issues... if You do not know yet how to write useful pieces of code with python, take your time and start from scratch... Happy end of month

gaunt fiber Mar 29, 2020, 2:42 PM

#

@balmy ocean Sounds good, thanks and same to you!

agile anvil Mar 29, 2020, 9:09 PM

#

@agile anvil looks nice - are you going to set anything up to track how right/wrong your predictions were though?
@jolly briar yes, I am keeping copies of each version, and updating them with the latest data every day. At least a half dozen people have put one month reminders on my reddit post so they will all have a look then

jolly briar Mar 29, 2020, 9:10 PM

#

@agile anvil cheers, what are your thoughts on the amount of people who've been making graphs based on raw count data with no experience in the field? Do you think there's a risk of a sort of misinformation as a result?

agile anvil Mar 29, 2020, 9:11 PM

#

well yes, but it's not consequential. Nobody is going to buy different amounts of canned goods or TP for 10k deaths versus 10m deaths

agile anvil Mar 29, 2020, 9:43 PM

#

here's a great range survey from https://fivethirtyeight.com/features/experts-say-the-coronavirus-outlook-has-worsened-but-the-trajectory-is-still-unclear/ consistent with Fauci's 100-200k prediction today

📎 boice.EXPERT-SURVEY.0325-5fix.png

crisp widget Mar 30, 2020, 4:15 AM

#

Hey good day !!! #
I’m doing a small project using spacy, I already have the Nouns from a big description, but I need to get only the products because I need to compare them with a DB
Any good ideas ?

final scaffold Mar 30, 2020, 5:05 AM

#

Hi! Is there any good source for learning dash plotly in python other than their official docs?

lapis sequoia Mar 30, 2020, 12:50 PM

#

have you tried the plotly samples

lapis sequoia Mar 30, 2020, 2:44 PM

#

how to separate multiple values in a single column?

📎 f585a81d-f274-4435-af59-dfc9d1fb9e90.png

#

data.genres.str.split(expand=True) . i used this one but it is splitting by line

rain palm Mar 30, 2020, 2:54 PM

#

@lapis sequoia Perhaps specifiy the separator? | in this case.

lapis sequoia Mar 30, 2020, 3:01 PM

#

this is the result . i thought if everything adds up in the same column it would be easy for me get count of each genre

📎 unknown.png

#

i will try other ways

half parrot Mar 30, 2020, 5:33 PM

#

Hi guys, I would like to train a regression model with one of boosting algorithms (e.g. lightGBM, XGBoost) and use one-hold-out cross-validation (patient-wise). I'd like to implement a custom loss function that minimizes mean absolute error and regularizes based on the maximum correlation between the reference and estimated values of the batch/fold. I'm new with Python and I would appreciate if anyone can help in this matter.

primal ravine Mar 30, 2020, 8:17 PM

#

Hey can someone please help me out, imtrying to learn Logisitc regression through pythin implementation. But i dont understand what this code is really doing

#

📎 unknown.png

#

I dont understand why we are divindng thc cost function by m or dividing dW by m

#

and how they relate to each other

lapis sequoia Mar 30, 2020, 8:31 PM

#

Is there any good resources to get started on sales forecasting?

velvet thorn Mar 31, 2020, 12:12 AM

#

@primal ravine mean error

#

@half parrot do you have a formula for that

lapis sequoia Mar 31, 2020, 4:06 AM

#

hey

#

so

#

for a data cube

#

i dont get when they say a data cube is a lattice of cuboids

#

what are the cuboids??

#

kinda confused by that

lapis sequoia Mar 31, 2020, 2:44 PM

#

Which module would you recommend for interpolation, data regression?

burnt wharf Mar 31, 2020, 3:08 PM

#

guys i m not able to install tensorflow 2 to use tensorflow_hub for my project

#

my version of tensorflow in jupyter notebook shows tensorflow 1.11

#

i am trying to install directly in notebook using !pip3 install tensorflow==2.0.0

#

but same version after install

#

any help guys?

oblique belfry Mar 31, 2020, 8:43 PM

#

Are you in a virtual environment?

tribal wagon Mar 31, 2020, 10:27 PM

#

Check your python version if it's above 3.7.0 it wont install

mossy sand Apr 1, 2020, 1:19 AM

#

Would anyone here be able to assist me with a Matplotlib issue?

polar acorn Apr 1, 2020, 12:57 PM

#

!ask

arctic wedgeBOT Apr 1, 2020, 12:57 PM

#

Asking good questions will yield a much higher chance of a quick response:

• Don't ask to ask your question, just go ahead and tell us your problem.
• Don't ask if anyone is knowledgeable in some area, filtering serves no purpose.
• Try to solve the problem on your own first, we're not going to write code for you.
• Show us the code you've tried and any errors or unexpected results it's giving.
• Be patient while we're helping you.

You can find a much more detailed explanation on our website.

burnt wharf Apr 1, 2020, 1:21 PM

#

hey guys, i am trying to get embeddings of text data which is read in pandas dataframe and save in new column of dataframe which is (512,1) dimensions. I have got the embeddings but not able to save for each text row in new column with same index. it throws error.
ValueError: Length of values does not match length of index

module_url = "https://tfhub.dev/google/universal-sentence-encoder/4" #@param ["https://tfhub.dev/google/universal-sentence-encoder/4", "https://tfhub.dev/google/universal-sentence-encoder-large/5"]
model = thub.load(module_url)
print ("module %s loaded" % module_url)
def embed(input):
    return model(input)```
this is the model i m using to get embeddings
```python
for t in df['title'].iteritems():
    df = df.assign(emb_title = np.array(embed([t[1]])))```
https://paste.pythondiscord.com/pigaqumuha.py this is the matrix i got after printing using this code 
```python
for t in df['title'].iteritems():
    print(np.array(embed([t[1]])))```
can someone help me here?

vocal egret Apr 1, 2020, 5:30 PM

#

Anyone have experience rendering 3d data with plotly?

#

animating *

lapis sequoia Apr 1, 2020, 6:14 PM

#

Hello, any data analyst to help me with what can be analyzed from Medical Transcription dataset, if any please pm me. Thank you

oblique belfry Apr 1, 2020, 8:02 PM

#

In order to leverage new breakthroughs in AI and Machine Learning, it seems the common trend is to use a "backbone" network and extend it. (I am thinking of ResNet-101 for CV tasks and either BERT or GPT-2 for NLP tasks.) This approach makes sense because you do not want to reinvent the wheel and waste time and compute on building the backbone yourself.

I am uneasy doing this in the enterprise setting because I do not necessarily trust the precomputed weights. My question is: am I being overly cautious or how does one balance this dilemma of using new techniques with pre-computed weights you have not been able to verify yourself?

polar acorn Apr 1, 2020, 8:49 PM

#

I guess the proof would be in the pudding? If theres any point to using ML at all you should have a large dataset and metrics to score your solution by. Which means you could use transfer learning and do something from scratch and simply find out whats better in your case.

kind ermine Apr 1, 2020, 8:50 PM

#

Hi i am a begginer when i comes to programming and I descided that the first language I try is python. I completed a begginers course and now know the basics but I cant seem to find a project that suits my knowlage. I am interested in ai and machine learning. I will be very greatefull if you could suggest something I can work on.

lament cargo Apr 1, 2020, 9:56 PM

#

hi @kind ermine ! i'd start by looking into linear regression and logistic regression

kind ermine Apr 1, 2020, 10:26 PM

#

Hi i will surly look into that but i want to start and complete a project just to keep my motivation up.I am familiar with the basics of python like loops lists and all that but for now all I find interesting like machine learning projects or path finding programms seems a bit complicated( i mean the coding part, ). I think it is because i am not familiar with the libraries and algorithms they include.I am looking for something to put the few knowledge i gather during the quaranteen.

#

Also i am still in highschool and i am unfamiliar with things like calculus( i dont know if i need things like that for begginer project).I just like learning through projects and work a lot and i did the begginers course just to learn the fundamentals but know i am a little confused with all the libraries and different things.

ripe forge Apr 2, 2020, 5:31 AM

#

So, part of the data science turf comes with its own libraries and algorithms. To make it simpler for you, numpy is for arrays that all libraries are built upon. Pandas is basically for tables. These two you'll just have to get familiar with at least somewhat whenever dealing with data

#

After that, you pick a library based on your task. Simple model? Scikit learn. Deep learning? Tensor flow. And so on.

#

So I'd say, don't worry, give it some time. And you'll want to get familiar with those two first, and then just one library for whatever task you're doing.

uncut shadow Apr 2, 2020, 5:57 AM

#

Or make it from scratch

kind ermine Apr 2, 2020, 6:22 AM

#

Ok thank you for the help

agile anvil Apr 2, 2020, 10:00 AM

#

Check these cool ER wait time graphs:

📎 ech-waits.png

arctic wedgeBOT Apr 2, 2020, 10:00 AM

#

Hey @agile anvil!

It looks like you tried to attach a Python file - please use a code-pasting service such as https://paste.pythondiscord.com

agile anvil Apr 2, 2020, 10:02 AM

#

Here's the source, https://paste.pythondiscord.com/ikomahacog.py you can see how to download your own ER's data at https://docs.google.com/spreadsheets/d/1Tqm1AU58VF2bvu_v81S8hgFFn2ukbEzVPb6HC24O2Dc

Google Docs

Emergency room wait times

Log

El Camino,ECH minutes,ECH patients,Stanford,Stanford minutes,Stanford patients,Diurnal hour.frac
2020-03-31 05:01,0,0, 2020-03-31 05:00,0,0,5.0
2020-03-31 05:10,0,0, 2020-03-31 05:10,0,0,5.2
2020-03-31 05:20,0,0, 2020-03-31 05:20,4,1,5.3
2020-03-31 05:31,0,0, 2020-03-...

rain palm Apr 2, 2020, 11:21 AM

#

@agile anvil Very cool graphs!

agile anvil Apr 2, 2020, 12:14 PM

#

🙂

maiden palm Apr 2, 2020, 1:05 PM

#

Any idea how I could make a "sector graph" ? More like an horizontal stacked bar graph but with redundant data type (only two)
Something like this:

📎 sector_graph_example.png

maiden palm Apr 2, 2020, 1:30 PM

#

Nevermind, found the matplotlib "barcode" model could do the job

harsh sapphire Apr 2, 2020, 4:42 PM

#

Check out missingno package too

restive peak Apr 2, 2020, 5:40 PM

#

Hi, so I'm currently using tesseract to attempt to do some OCR. The majority of the time the results are accurate however some digits randomly aren't read at all. An example of this would be this image:

📎 unknown.png

#

Where the 0 isn't picked up

#

However in all the other images that I input which has the exact same format and also contains 0's it picks it up

#

Was wondering what other image processing I could do to decrease the chances of values not getting picked up correctly.

#

Also as a side note I've set the custom config for single digits as without that it didn't pick up any of the single digits.

drifting hemlock Apr 2, 2020, 9:23 PM

#

Can someone enlighten mi a little bit? I'm trying to build a data-lake, I have to admit that I'm very new in that area. We have information coming from multiple API's and we want to store that information into a S3 bucket for further analysis. Is there a solution in AWS to automate that process? Or I have to create a python script and schedule an extraction task?

#

Let's say that I want to crawl the Studio Ghibli API (https://ghibliapi.herokuapp.com/films/) and store snapshots in a S3 bucket, is there a way to do this directly in the AWS console? Or do I have to build a script for it?

opaque stratus Apr 3, 2020, 1:21 AM

#

Hello,
Wondering if anyone have taken these courses; if so, what is the best approach/way to absorb the material. I know it's all personal, but i've never learned anything quite like this so I am open to ideas/opinions from people experienced with this domain. Thanks 😄

warm hollow Apr 3, 2020, 6:39 AM

#

Hey everyone. I'm not really sure how to do this but I had the idea. Theres this reddit thread: https://www.reddit.com/r/askreddit/comments/fu09ok

I'd like to parse it and get make a word bubble that lists the occurrences for individual digits. Can anyone do this or point me towards how I may be able to?

r/AskReddit - You are 1 of 9,999,999 people asked. You get to keep ...

43 votes and 124 comments so far on Reddit

marsh swallow Apr 3, 2020, 8:08 AM

#

I think there's a Reddit API you can use that might answer this question. Google it and see what you find, it should come with a tutorial and documentation on how to use it.

lapis sequoia Apr 3, 2020, 8:41 AM

#

Hello, do you know a website or book to learn machine learning?

warm hollow Apr 3, 2020, 8:43 AM

#

I figured it out but the word bubble idea turned out to be dumb, too confusing

#

bar graph made more sense but not as hipster

plush raft Apr 3, 2020, 10:17 AM

#

@lapis sequoia i have hella books on python machine learning + some vids and examples i think. Message me

raw fractal Apr 3, 2020, 2:49 PM

#

Hi everybody, can someone quickly explain to me the usefulness of asynchronous programming ? (asyncio package python)

ripe forge Apr 3, 2020, 3:01 PM

#

from the pic, i can't tell what makes red and magenta squares different

shell quartz Apr 3, 2020, 4:23 PM

#

Hi - was wondering if anyone could answer a question regarding k-means clustering.

I've been reading guided and looking at examples like this one: https://github.com/corvasto/Simple-k-Means-Clustering-Python where the data read in is simple values in two columns.

For my assignment I have been given data in the form: (animal, countries, fruits, veggies all in separate files)
eg Animal file -
elephant -0.015926 -0.079864 ...
leopard 0.47727 -0.91587 ...
dog -0.33575 0.38897 ...
etc

So I'm confused how this will work when it comes to plotting the data.
Appreciate any help

GitHub

corvasto/Simple-k-Means-Clustering-Python

Simple k-means clustering (centroid-based) using Python - corvasto/Simple-k-Means-Clustering-Python

willow holly Apr 3, 2020, 6:10 PM

#

Hi, looking for an advise. Everyday I scan the SalesForce to find new opportunities and update the data accordingly. I am looking for a library or method that will store the last_scan_day and create a timestamp afterwards. So that the next day when the code scans for new opportunities it knows what the last_scan_day was - only checks the data that is Date > last_scan_day and updates the timestamp with a new date. And so on. It is a very high level of what I am trying to do, but if anyone can direct me to the right python methods, that would be awesome.

lapis sequoia Apr 3, 2020, 10:04 PM

#

Hello fellow python enthusiasts. I got a numpy related question. If I have a 2d array nxm, and I want to make it into a bigger 4d mxnxmxn matrix, how would I do that? Basically I'm asking how to vectorise this function:


array = np.arange(12).reshape((4,3)) #m = 4, n = 3

#how to vectorise this function?
def makeBigger(a):
    ans = np.ones(a.shape[0]*a.shape[1]*a.shape[0]*a.shape[1])
    ans = ans.reshape((a.shape[0], a.shape[1], a.shape[0], a.shape[1]))
    for i in range(a.shape[0]):
        for j in range(a.shape[1]):
            ans[i][j][:][:] = a[i][j]
    return ans

print(makeBigger(array))
#prints the matrix I am looking for.```

silent swan Apr 3, 2020, 10:48 PM

#

tricky. I think you can get one dimension for free but not two

#

oh wait you're just copying across both new dimensions

lapis sequoia Apr 3, 2020, 10:49 PM

#

do i use np.copy?

silent swan Apr 3, 2020, 10:49 PM

#

give me a second, should be straightforward

#

new_array = np.empty([4, 3, 10, 11])
new_array[:, :, :, :] = array.reshape(4, 3, 1, 1)

#

could also simplify to new_array[:]

lapis sequoia Apr 3, 2020, 10:53 PM

#

I don't see how that's a solution

silent swan Apr 3, 2020, 10:54 PM

#

what do you mean

#

for i in range(array.shape[0]):
    for j in range(array.shape[1]):
        assert (new_array[i][j][:][:] == array[i][j]).all()

lapis sequoia Apr 3, 2020, 10:56 PM

#

oh i see

#

I'll test it

#

Sweet it works! Thanks bro

agile anvil Apr 4, 2020, 12:30 AM

#

lockdown reading https://fivethirtyeight.com/features/why-its-so-freaking-hard-to-make-a-good-covid-19-model/

FiveThirtyEight

maggiekb538

Why It’s So Freaking Hard To Make A Good COVID-19 Model

Here we are, in the middle of a pandemic, staring out our living room windows like aquarium fish. The question on everybody’s minds: How bad will this really ge…

oak grail Apr 4, 2020, 1:09 PM

#

lockdown reading https://fivethirtyeight.com/features/why-its-so-freaking-hard-to-make-a-good-covid-19-model/
@agile anvil
This is a fucking great article!

FiveThirtyEight

maggiekb538

Why It’s So Freaking Hard To Make A Good COVID-19 Model

Here we are, in the middle of a pandemic, staring out our living room windows like aquarium fish. The question on everybody’s minds: How bad will this really ge…

rancid dove Apr 4, 2020, 3:24 PM

#

Throwing this here since its a pandas question.

Anyone ever use custom accessors with pandas? If i make a custom accessor for a dataframe. If I perform a groupby, is that accessor available there? I'm guessing the groupby objects dont inherit from dataframe, or maybe they do.

lapis sequoia Apr 4, 2020, 4:32 PM

#

what do you mean custom accessor

#

you create a new df when you do a groupby

gaunt fiber Apr 4, 2020, 4:49 PM

#

super basic question about Pandas but I cant seem to find it on google - does In [*****] means that it is calculating/working? Seems to be very slow for me

rancid dove Apr 4, 2020, 4:52 PM

#

https://pandas.pydata.org/pandas-docs/stable/development/extending.html

#

@lapis sequoia

#

Ive decided to use their example and test it, it doesnt work the way I was hoping

lament orchid Apr 4, 2020, 6:25 PM

#

ok guys
I am having a problem with adding a list to a pandas dataframe
I have tried to use pd.series(mylist) to add it into my dataframe
however it turns the values of my list to "nan"
what is occuring here?
any ideas?

sand girder Apr 4, 2020, 6:51 PM

#

@gaunt fiber Is this in a Jupyter Notebook? Then yes

lament orchid Apr 4, 2020, 6:54 PM

#

hello

#

<@&267629731250176001>

sand girder Apr 4, 2020, 6:54 PM

#

@lament orchid could you post the code please? It's possible you're building your list incorrectly/not how you expected, or your list is not the same length as the dataframe

lament orchid Apr 4, 2020, 6:54 PM

#

yeah its not the same length

#

thats my issue

south quest Apr 4, 2020, 6:55 PM

#

@lament orchid Why did you ping moderators, are you in need of something moderating?

lament orchid Apr 4, 2020, 6:55 PM

#

arnt mods good at python?

#

why else would you be mods

south quest Apr 4, 2020, 6:55 PM

#

to moderate

#

pinging moderators is intended for attracting attention of mods if something bad is happening, there isn't a priority service if you ping staff. please only ping in future if you need something moderating.

solar torrent Apr 4, 2020, 7:14 PM

#

hey could someone help me out with a pandas question in #help-orange . I'm trying to access cols of a df using .iloc at two different positions

#

nvm. I'm good now

paper spindle Apr 4, 2020, 9:43 PM

#

I have a list of date data, like Sun May 19 00:53:53 2019 +0300, Mon May 20 20:01:07 2019 +0300, Mon Dec 16 01:02:47 2019 +0300 etc

#

what would be best way to plot it

#

Tue Nov 19 11:16:46 2019 +0300
Sun Oct 20 23:13:54 2019 +0300
Tue Aug 27 00:45:37 2019 +0300
Thu May 23 04:13:16 2019 +0300
Tue May 21 23:27:36 2019 +0300
Tue May 21 20:47:42 2019 +0300
Mon May 20 23:27:10 2019 +0300
Mon May 20 20:01:07 2019 +0300
Sun May 19 01:10:20 2019 +0300
Sun May 19 00:53:53 2019 +0300

this is an example data, and I want to kind of see it in daily basis

#

with line graphs

#

but I couldnt find the correct thing to display it

oak grail Apr 4, 2020, 9:46 PM

#

What values do you want from that data?
I only see the differences in dates and times...

#

Or do you have a larger dataset?

paper spindle Apr 4, 2020, 10:24 PM

#

these are commit dates, and I want to see a line graph of how many commits authored in last year

jolly briar Apr 4, 2020, 10:48 PM

#

@paper spindle over the last year? Wouldn't that just be a case of counting the rows? Or do you want a graph of the commits for each day over the last year (so may 21 above would have a value of 2)

paper spindle Apr 4, 2020, 10:49 PM

#

yep, the latter

jolly briar Apr 4, 2020, 10:52 PM

#

I'm on my phone, but I think you should be able to convert into a date time vector, aggregate then plot?

#

The time information isn't important as I understand it

late monolith Apr 4, 2020, 11:02 PM

#

Does anyone know how to normalize a wave function using numpy?

lapis sequoia Apr 5, 2020, 3:39 PM

#

hi

#

can someone help me reinforce this.. I'm learning partition by today.. I understand it's part of Over.. and it helps split the table into something it's going to be filtered by eventually, so it's lighter to handle and easier to run on large tables

#

(select avg(quantity), order_id, year
from salestable1
group by order_id, year);

-- same thing using partition by
select distinct year, order_id, avg(quantity) over(partition by year, order_id) as avg_books
from salestable1
group by order_id, year, quantity;

#

but I'm having trouble reinforcing this.. it'd be nice if someone explained the logic to me briefly

rain palm Apr 5, 2020, 3:54 PM

#

@lapis sequoia Perhaps it might help to visualise it: https://rextester.com/IISV27995

#

@lapis sequoia Also, #databases is the best section to ask in, for next time!

split steppe Apr 5, 2020, 9:10 PM

#

which library do y'all prefer for geospatial plotting?

restive peak Apr 5, 2020, 11:16 PM

#

Hi, so I'm currently using tesseract to attempt to do some OCR. The majority of the time the results are accurate however some digits randomly aren't read at all. An example of this would be this image:

Where the 0 isn't picked up
However in all the other images that I input which has the exact same format and also contains 0's it picks it up
Was wondering what other image processing I could do to decrease the chances of values not getting picked up correctly.
Also as a side note I've set the custom config for single digits as without that it didn't pick up any of the single digits.

lapis sequoia Apr 6, 2020, 12:05 AM

#

@rain palm I get this error remaining connection slots are reserved for non-replication superuser connections

rain palm Apr 6, 2020, 12:27 AM

#

@lapis sequoia Was about to go, but here are some links which may help. If not, perhaps ask in #databases.
https://dba.stackexchange.com/questions/120694/postgresql-remaining-connection-slots-are-reserved-for-non-replication-superuse
https://stackoverflow.com/questions/11847144/heroku-psql-fatal-remaining-connection-slots-are-reserved-for-non-replication
https://github.com/wagtail/wagtail/issues/1242

Database Administrators Stack Exchange

PostgreSQL: remaining connection slots are reserved for non-replica...

We have web application using Golang, PostgreSQL, and sqlx (adapter, connection pooler), that each request requires 1 to 8 queries, sometimes 1 transaction with 5-8 select and 5-8 insert queries.

We

Stack Overflow

Heroku "psql: FATAL: remaining connection slots are reserved for no...

I'm developing an app on Heroku with a Postgresql backend. Periodically, I get this error message when trying to access the database, both from the CLI and from loading a page on the server:

psql:...

GitHub

FATAL: remaining connection slots are reserved for non-replication...

Hi Wagtail Team Let me start by saying that Wagtail is totally awesome! We are very, very excited what you have done here. So far it has been such a pleasure to work with. It just makes sense in so...

lapis sequoia Apr 6, 2020, 12:43 AM

#

hi

#

can i get an help installing cv2 please (im using anaconda3)?

knotty hamlet Apr 6, 2020, 11:05 AM

#

conda install opencv

deft harbor Apr 6, 2020, 4:38 PM

#

📎 5646.png

wild spoke Apr 6, 2020, 4:44 PM

#

How do i reshape data of shape (167076, 66) to shape with 5 time steps for LSTM network... i am getting value error for using : X_train.reshape(int(X_train.shape[0]/5), 5, X_train.shape[1])

covert skiff Apr 6, 2020, 8:13 PM

#

good morning, how do I combine in general a timeseries analysis with additional datapoints, for example.. the stock price of a company..with including the gdp development or something like that ?

gentle depot Apr 6, 2020, 9:20 PM

#

Hello,
If I use anaconda, or anything insise what comes in the bundle, do I have to reference or give credit in a research work like a thesis?

#

Same question for public domain data such as iris dataset

wintry atlas Apr 6, 2020, 9:37 PM

#

Hi all,

I have no idea why this won't run when I try to apply

    def classlto(df):
    if df[df['prevclass'] < df['classn'] | df['won'] == 1]
        return df['bf_decimal_sp']-1
    elif df[df['prevclass'] < df['classn'] | df['won'] == 0]
        return -1

#

I was pondering that it's as there is not absolute True or False statement, but I believe that there is.

#

More generally I am wondering if there is a better way to frame a thesis when exploring a dataset

copper umbra Apr 7, 2020, 1:58 AM

#

Any data scientists want to dive into interpreting a flatten the curve covid python model (partial code i need to fix) with me...

agile anvil Apr 7, 2020, 2:53 AM

#

This is the "bending the curve" that we are all staying home for. Animation source: https://paste.pythondiscord.com/rajuzixoke.py Static source at bit.ly/covidGrowth

📎 covid.mp4

agile anvil Apr 7, 2020, 3:21 AM

#

@copper umbra sure! 😄

#

maybe my repl.it for the static graphs helps? http://bit.ly/covidGrowth

repl.it

jsalsman

COVID19USgrowthExtrapolation

U.S. COVID-19 infection and death extrapolations fit to cumulative lognormal distributions in the log domain

copper umbra Apr 7, 2020, 3:34 AM

#

@agile anvil i am a state employee data analyst, but probably one of the closest we have to potential datascientist so they through a project on th lap this evening

#

to intreprt a statistical model for how social distancing effects the curve

#

the problem is the example code i was sent is in pieeces, it is 2 defs and no executeion code and has missing references.

#

i spent a few hours this evening trying to figure it out and am struggling

#

the model you sent is meant to predict the current growth based on data correct

agile anvil Apr 7, 2020, 4:27 AM

#

@copper umbra yes; sorry for my delay I have to be AFK for a little while. If you want to DM me code please do, I promise to keep it confidential.

copper umbra Apr 7, 2020, 4:28 AM

#

I am about to go to bed almost midnight here. But this code is not private. I will dm you tomorrow

agile anvil Apr 7, 2020, 6:44 AM

#

very good

whole rampart Apr 7, 2020, 10:09 AM

#

Hi, I'm new to this server and am wondering if this would be the correct place to ask if anyone has experience with generating netCDF files from a folder of .tiff files?

velvet thorn Apr 7, 2020, 12:20 PM

#

@whole rampart probably the wrong channel...

#

hm, actually

#

thinking about it

#

it's either this or a help channel, but that's a somewhat specialised query

whole rampart Apr 7, 2020, 12:24 PM

#

I've found 1 thread with a similar problem on stackoverflow. I'll come back here with more information if I am unsuccessful

worn river Apr 7, 2020, 12:58 PM

#

@wild spoke Try using TimeSeriesGenerator within keras.sequence.preprocessing

#

Automatically reshapes the data into 3 dimensional format

steady leaf Apr 7, 2020, 1:05 PM

#

Has anyone done data science projects for a coffeeshop and or bakery?

polar acorn Apr 7, 2020, 1:07 PM

#

I swear somebody was asking about that not too long ago. Is that for a course or something?

steady leaf Apr 7, 2020, 1:14 PM

#

Really??

#

Its actually for my own bakery cafe haha

#

Wow a course on this would be awesome

thin terrace Apr 7, 2020, 2:38 PM

#

Hey,

I have two arrays that I have normalized between 0.0 and 1.0 using sklearn.preprocessing.MinMaxScaler. I want a measure of how well these coincide so I thought calculating the entropy between them could be useful (?).

I tried doing something like scipy.stats.entropy(arr1, arr2) but quickly realized that's not how it works. Any ideas how to do this? I'm expecting a single float as output value

valid drum Apr 7, 2020, 4:18 PM

#

Hi, question about glorot_uniform initializer.
In convolutional nets, what are the fan_in and fan_out and how can I calculate them?

oblique belfry Apr 7, 2020, 7:09 PM

#

Begin rant.

Nothing is more boring than refactoring someone's bad python code that is a port of another person's poor Matlab code. Unfortunately, this is the fastest way for me to get this thing done. ML sucks at times. Refactoring shitty algorithms from Lua or Matlab is not how I want to spend my Tuesdays.

Rant over.

Hope everyone is having a great day in ML land.

main coral Apr 8, 2020, 12:44 AM

#

Hey everyone! Need desperate help with a neural network problem I am working on. I’m self-taught and so far this is the only thing I have had trouble finding the answers for online. If your familiar with CNNN, spotlight, or recommender systems please please please message me

lament cargo Apr 8, 2020, 1:28 AM

#

@main coral i'm learning too, whats the issue youre running into in case i might know anything about it?

agile anvil Apr 8, 2020, 6:13 AM

#

@oblique belfry just be glad you don't have to port someone's minpack

oblique belfry Apr 8, 2020, 3:05 PM

#

@agile anvil You are right. I will be thankful for my current situation. 😄

shrewd trellis Apr 8, 2020, 3:29 PM

#

Hey guys I have a dataset with approx 500 images (kinda small) with 5 classes with high resolution images and small detail so I can’t resize much ...

Should I use a pre trained network? I try with resnet50 and vgg16 with my own input and I’m always around 30% accuracy which is pretty bad. Any idea or paper I should look ?

arctic wedgeBOT Apr 8, 2020, 4:17 PM

#

Hey @trail pagoda!

It looks like you tried to attach file type(s) that we do not allow (.txt). We currently allow the following file types: .3gp, .3g2, .avi, .bmp, .gif, .h264, .jpg, .jpeg, .m4v, .mkv, .mov, .mp4, .mpeg, .mpg, .png, .tiff, .wmv, .svg, .psd, .ai, .aep, .xcf, .mp3, .wav, .ogg, .md.

Feel free to ask in #community-meta if you think this is a mistake.

trail pagoda Apr 8, 2020, 4:17 PM

#

Does anyone know how to properly interpret the output of a torch utils bottleneck run on your code

📎 unknown.png

#

My GPU is only at 20% util and I don't know what all these "Fills" mean

steady dome Apr 8, 2020, 6:49 PM

#

anybody here experienced using pandas dataframe?

I have a two column dataframe with a string value in the first column and an int value in the second column. I need to write my code to find where the string in the first column matches some string, and then I next need the output to be the int value in the second column at that same row.

df.loc and df.iloc look similar but presume you have both the row and column known which I don't, only the column. I think maybe I'm supposed to use df.at / df.iat but same problem, these functions need information that changes.

And I read that I should be able to do

df.loc[df['Col1'] == mystring, ['Col2']]

to get a specific column instead of the whole row... but it's returning the value in a dataframe format. I need to be able to get it into a number (int) format eventually.

reef bone Apr 8, 2020, 7:09 PM

#

hmm, I can show you a possible way to do this but I probably cannot show you the best way to do this

#

i've never been able to figure out how to use pandas effectively so I sometimes write hacky / ugly solutions

#

I think what you have is a good start

#

let's work with this as an example dataframe

#

import pandas

dataframe = pandas.DataFrame(
    {
        "Names": ["Cat", "Dog", "Bird"],
        "Ages": [12, 40, 36],
    }
)

#

say we want to get the age of Cat

#

I will first get the rows where Names matches Cat

#

>>> dataframe.loc[dataframe["Names"] == "Cat"]
  Names  Ages
0   Cat    12

#

looks good so far

steady dome Apr 8, 2020, 7:12 PM

#

I've been able to do that so far^

reef bone Apr 8, 2020, 7:13 PM

#

I can then grab the Ages column

#

>>> dataframe.loc[dataframe["Names"] == "Cat"]["Ages"]
0    12
Name: Ages, dtype: int64

#

so this is a Series object

#

which is iterable, so we should be able to unpack it

#

>>> [output] = dataframe.loc[dataframe["Names"] == "Cat"]["Ages"]
>>> output
12

#

thats our int

#

alternatively, I think we can grab it by index using iloc

#

yea works too

#

>>> dataframe.loc[dataframe["Names"] == "Cat"]["Ages"].iloc[0]
12

#

actually we dont need iloc, the series is subscriptable

#

>>> dataframe.loc[dataframe["Names"] == "Cat"]["Ages"][0]
12

#

works too

steady dome Apr 8, 2020, 7:15 PM

#

oh wait did I do my "[]s" wrong?

#

oh my gosh one sec

reef bone Apr 8, 2020, 7:16 PM

#

this works too (although admittedly I don't really know why)

#

>>> dataframe.loc[dataframe["Names"] == "Cat", ["Ages"]]
   Ages
0    12

#

but it still gives a dataframe

#

so it's still 2D and we need to iloc the value via both dims

#

>>> dataframe.loc[dataframe["Names"] == "Cat", ["Ages"]].iloc[0, 0]
12

#

lots of ways

steady dome Apr 8, 2020, 7:20 PM

#

Frustrating because I see this working for you and I am still having issues so I think I need to go back and look at my dataframe and see if there is something funky there? idk.

37 323313
Name: Col2, dtype: object

reef bone Apr 8, 2020, 7:21 PM

#

maybe you can show me just the code that you use

#

so it looks like that should be row 37 and the value is 323313

steady dome Apr 8, 2020, 7:21 PM

#

yup

reef bone Apr 8, 2020, 7:22 PM

#

what happens when you do [0] on it

steady dome Apr 8, 2020, 7:22 PM

#

infringing = DLdf.loc[DLdf['Title'] == title]['Unique Infringements']

#

sec, having some other weird error pop up-

reef bone Apr 8, 2020, 7:24 PM

#

no rush

steady dome Apr 8, 2020, 7:24 PM

#

.iloc[0]
gives
IndexError: single positional indexer is out-of-bounds

#

.iloc[0,0]
gives
IndexingError: Too many indexers

#

and just [0] at the end gives KeyError: 0

reef bone Apr 8, 2020, 7:26 PM

#

ok, so there's nothing in your series

#

looks like there are no rows where the title was found

#

is that possible?

steady dome Apr 8, 2020, 7:26 PM

#

yea, possible

#

until things are updated, likely lmao

reef bone Apr 8, 2020, 7:26 PM

#

try to search for something that is present

#

if the series is of length 0, then the index 0 will be out of bounds

#

by the way, since the series is iterable, it may be easier for you to work with it as a python list

#

>>> list(dataframe.loc[dataframe["Names"] == "Cat"]["Ages"])
[12]

#

so if i search for an animal that doesn't exist in my dataframe, i should get an empty list

#

>>> list(dataframe.loc[dataframe["Names"] == "Elephant"]["Ages"])
[]

#

of course if the animal was present many times, then there would be many ages in my list

#

these are all cases that need to be accounted for, depending on what kind of data you have and how many assertions you can make about it

steady dome Apr 8, 2020, 7:30 PM

#

ok so I can set up a try/except and that shows me that I CAN get the numbers out of there:

#

if I do .iloc[0] which the number is present, it gives me a number!

reef bone Apr 8, 2020, 7:31 PM

#

excellent

steady dome Apr 8, 2020, 7:32 PM

#

so when I do .iloc[0] on the returned dataframe... since it's a single row, iloc is looking for the first item in that row (idex starts at 0) correct?

reef bone Apr 8, 2020, 7:32 PM

#

yes

steady dome Apr 8, 2020, 7:32 PM

#

OK good good that's how I visualized it/ understood it

#

it's a relief something makes sense for once!

reef bone Apr 8, 2020, 7:33 PM

#

a try-except would work, but maybe it'd be nicer to look at the length of the resulting series

#

>>> len(dataframe.loc[dataframe["Names"] == "Elephant"]["Ages"])
0
>>> 
>>> len(dataframe.loc[dataframe["Names"] == "Cat"]["Ages"])
1

#

it kinda depends on what you're looking to do next

steady dome Apr 8, 2020, 7:34 PM

#

Yea, I had gotten halfway there so at some point it was
is not df.empty:

because the times where it can't find the title I need to hard code it to 0 (and I was getting errors when I was looking for the first in a zero length row)

#

but now that I chopped away at this so much I need to go back to my notebook and write out what I want to do then I can go back to my keyboard and fix it up

reef bone Apr 8, 2020, 7:34 PM

#

right haha, ok

steady dome Apr 8, 2020, 7:34 PM

#

thank you so so much!

reef bone Apr 8, 2020, 7:35 PM

#

no worries, glad I could help

#

by the way, it's interesting how the row filtering works

#

>>> dataframe["Names"] == "Cat"
0     True
1    False
2    False
Name: Names, dtype: bool

#

this gives a series of bools, which tell you where the condition holds and where it doesnt

steady dome Apr 8, 2020, 7:35 PM

#

I was halfway there (my set up was pretty much almost there) but I was getting tripped up by the errors I got because I was trying to get the first item in the list even when the list was empty and that threw the error.

#

Yea I was reading up pandas docs to try to find out what I am supposed to use here and it looks like there is a lot of stuff that is True/False

#

I didn't see how I could use that?

reef bone Apr 8, 2020, 7:36 PM

#

yeah, it may be better to split up the process into logical chunks, i.e. first get the indices, then the filtered df, then the column, and finally the value

#

that way, once it fails, you know exactly which step caused the error

#

if you do it all in-line, it's harder to see

#

yea, so in my case I had 3 rows in my df

#

so if I do dataframe["Names"] == "Cat", it will tell me on which rows the condition holds

#

you can see that it's True, False, False because it only holds on the first line

steady dome Apr 8, 2020, 7:38 PM

#

oh but I bet I could get the rows from there not too difficult after that point?

reef bone Apr 8, 2020, 7:38 PM

#

and then this series is passed to the loc, and it simply grabs the indices where it's True

#

what we do in the next step is get the rows

#

using this boolean series

#

we can build it ourselves

#

>>> dataframe[[True, False, False]]
  Names  Ages
0   Cat    12

#

>>> dataframe[[True, False, True]]
  Names  Ages
0   Cat    12
2  Bird    36

#

etc

#

it's a two-step process

steady dome Apr 8, 2020, 7:39 PM

#

hey that's kinda cool 😄

reef bone Apr 8, 2020, 7:39 PM

#

yeah, it is cool

steady dome Apr 8, 2020, 7:39 PM

#

sensible use of time to only grab where is True and do the work on those

reef bone Apr 8, 2020, 7:40 PM

#

yeah, exactly

#

I suppose the goal is to get something that feels similar to an SQL select ... where

steady dome Apr 8, 2020, 7:40 PM

#

I hope some day the way that computers vectorize problems becomes intuitive to me because I think in terms of step by step and looping through things so it's not... there yet.

reef bone Apr 8, 2020, 7:41 PM

#

pandas is confusing, but I promise it does get easier

steady dome Apr 8, 2020, 7:42 PM

#

back in (unrelated) school I used xlrd instead of pandas and did nested for loops for a project because I did NOT have a handle on pandas at the time. Even now sometimes I feel just for loops in xlrd is so much easier even if it might be technically slower. My data isn't big enough where that's a make or break for me

reef bone Apr 8, 2020, 7:43 PM

#

yeah, I'm definitely guilty of similar things

#

especially when it's one-off, throwaway code

#

sometimes you just dont want to go through the effort of learning something entirely new

steady dome Apr 8, 2020, 7:44 PM

#

Not when I try and try and spend all that time trying to make the fancy pandas work and give up and do it with xlrd/xlwt and call it a day.

#

Didn't feel great about how I did it but I got the task done and moved on

reef bone Apr 8, 2020, 7:45 PM

#

every solution cannot be the best solution, lol

#

we do what we must

steady dome Apr 8, 2020, 7:46 PM

#

(because we can 🎵 ) (sorry)

steady dome Apr 8, 2020, 8:11 PM

#

Got it to work. Well, this function at least. Thank you so so SO much!

valid drum Apr 8, 2020, 8:50 PM

#

Is that a correct implementation for dropout?

    def dropout(self, x):
        """
        Applies dropout on `x`
        :param x: input array
        """
        shape = x.shape
        noise = np.random.choice([0, 1], shape, replace=True, p=[self.rate, 1-self.rate])
        return x * noise / (1 - self.rate)

strange stag Apr 8, 2020, 9:56 PM

#

anyone know how i can find the image bounding box coordinates of an image within an image?

lapis sequoia Apr 8, 2020, 10:22 PM

#

Not sure but i think my question belongs here

#

python question

Write a program matrix.py that takes a matrix of integers as input. The program then determines the largest and smallest elements in each row and column. It also calculates the sum of numbers in each row and column. The program prints all these findings as output.```

#

Been stuxk on this for 5 hours now

#

For input, the program will ask the user the number of rows and columns at first. Then it will ask the user to enter the items row by row, all in one line, each item separated by space. Then the program will process this input and save the items as integers in a 2D list (that is, list of list).

drowsy grove Apr 8, 2020, 10:38 PM

#

Quick Pandas question:
Does anyone know how to rename multilevel column names?
To be more specific: rename one level of column names, if it's "Unnamed..." so that it's the same as the other level column name.

strange stag Apr 8, 2020, 10:38 PM

#

ye

drowsy grove Apr 8, 2020, 10:39 PM

#

Basically, how to get from here:

📎 unknown.png

#

To here

📎 unknown.png

#

I'd like to use just the level1. But some of them are "Unnamed" when level0 hold the names I want them to have.

strange stag Apr 8, 2020, 10:39 PM

#

df[('', 'column_name')] = df[('', 'another_name')] 
df = df.drop(columns=[('', 'column_name')])

#

oh, try reset_index()

drowsy grove Apr 8, 2020, 10:40 PM

#

That's great. What if I have a lot of these columns

#

reset_index() didn't work to my dismay

strange stag Apr 8, 2020, 10:40 PM

#

.reset_index()
been a bit

#

wdym didnt work

lapis sequoia Apr 8, 2020, 10:41 PM

#

Nubonix, would you mind helping me real quick with my thingy?

strange stag Apr 8, 2020, 10:41 PM

#

been a bit since ive last done this*

#

can try

drowsy grove Apr 8, 2020, 10:41 PM

#

Original data is quite messy. But reset_index() gives me something like this

📎 unknown.png

#

Can I write a function to rename the 2nd level names if there is "Unnamed" in it?

#

I forgot that tuple is immutable so I failed.

strange stag Apr 8, 2020, 10:43 PM

#

oh right, well this is kinda hacky... but you could write to a csv and then read it

#

unless u wanna rename every column

#

and then drop the multiindex version of the column

#

there are other ways, but its a pain

#

@lapis sequoia hit me

drowsy grove Apr 8, 2020, 10:44 PM

#

I thought I was smart enough to not have to resort to saving it as csv. It turned out that I spent way more time.

#

Will try

strange stag Apr 8, 2020, 10:44 PM

#

ik its dumb, but it works

#

otherwise u can research multiindex to single index in pandas via google

#

or ask someone else cause i dont wanna cover it 😛

#

well, dono really how, without googling myself..so

drowsy grove Apr 8, 2020, 10:45 PM

#

Will both Google and try the csv route. Thanks.

strange stag Apr 8, 2020, 10:45 PM

#

np, if that doesnt work, ill try to help more

drowsy grove Apr 8, 2020, 10:46 PM

#

Thanks nubonix

lapis sequoia Apr 8, 2020, 10:46 PM

#

@lapis sequoia hit me
@strange stag dm?

strange stag Apr 8, 2020, 10:47 PM

#

ight

opaque stratus Apr 9, 2020, 1:50 AM

#

I have been having some trouble digging deeper into data science. I've tried lots of approaches to learning (Books, MOOCs, Research, Projects)... (I always here the best way to get into data science is with projects, which is what I am doing as we speak). However, at the end of the day I feel directionless, like I am just repeatedly exploring the shallow perimeters instead of taking a leap into the greater depths. How did you really get into Data Science? I know there are lots of specializations and lots of industries, so perhaps I need to identify what specialization and what industry I want to pursue...

merry wraith Apr 9, 2020, 2:28 AM

#

I feel the same way @opaque stratus . I feel like I never have any direction when learning online and I'm always scraping surface value info only to get bored and move on to the next thing because I'm not getting an value from anything. I found out my company (tech giant) reimburses the cost of some nanodegrees from an online MOOC site. I have been enjoying that more because 1 it provides more structure and 2 I feel like I really need to complete it in order to get reimbursed

#

I'm taking an intro to data science with python course that also teaches sql and I'm enjoying it a lot. ~1hr per day

woven tundra Apr 9, 2020, 7:23 AM

#

I have a really dumb question that I need to ask because I want to make sure I'm thinking about this correctly and I'm not high.

I built a linear regression model after normalizing the variables. I now have to plot the actuals and the predictions (on the same dataset used to build the model) on a scatterplot.

After un-normalizing (or un-transforming) the predictions, there should be a somewhat visually evident linear relationship between it and the un-normalized actuals as well right? Because there is a linear relationship between the normalized actuals and predictions.

grave mango Apr 9, 2020, 9:15 AM

#

Anyone uses xpath helper here ?

agile anvil Apr 9, 2020, 10:22 AM

#

@woven tundra does the linear model not work if you don't normalize? Usually you don't need to normalize if it's an ordinary linear model. How many independents?

woven tundra Apr 9, 2020, 10:26 AM

#

@agile anvil Just 3 independents (that's after testing about 20 more all of which were insignificant but we're not really interested in accuracy, we just want to make a point to a client).

If we don't normalize the R-squared drops from 50% to 18%. Although I didn't test it with just normal min/max scaling.

#

Don't you have to normalize though? A regression model assumes normally distributed independents yeah?

agile anvil Apr 9, 2020, 10:30 AM

#

@woven tundra you're right, sorry https://stats.stackexchange.com/questions/306019/in-linear-regression-why-do-we-often-have-to-normalize-independent-variables-pr/306032

Cross Validated

In linear regression, why do we often have to normalize independent...

I have seen in many books that they will tell you to normalize independent variables in a linear regression before model fitting. My understanding was always one of making sure your betas would be

#

Does the answer there help?

#

what is the polynomial order?

woven tundra Apr 9, 2020, 10:38 AM

#

No worries. I spoke to a more statistically-inclined colleague about my initial message. He agreed that normalization just centers everything and you should see a somewhat linear relationship between your actuals and predictions after you un-normalize it. Of course the strength of the relationship you see depends on the accuracy rate of your model.

#

Not using any polynomials @agile anvil

agile anvil Apr 9, 2020, 11:07 AM

#

no squared or cubed terms?

woven tundra Apr 9, 2020, 11:11 AM

#

Nope

silver igloo Apr 9, 2020, 11:34 AM

#

Hello, I've been programming python for 3 years, How can I start studying data science?

woven tundra Apr 9, 2020, 11:39 AM

#

@silver igloo Plenty of places to start to be honest, how do you learn best? Reading? Online courses? Or just jumping into things and figuring it out?

cunning osprey Apr 9, 2020, 11:50 AM

#

Anyone got any good mathematical sources

#

Like reading about equations and stuff

mild topaz Apr 9, 2020, 1:04 PM

#

I am building a ML model. I am getting training results are as follows.
loss: 0.0071 - acc: 1.0000 - val_loss: 0.1213 - val_acc: 1.0000

#

what can i do for getting proper results and avoid overfitting of model

woven tundra Apr 9, 2020, 1:17 PM

#

plot your learning curves

#

can't go by just a metric

#

If the gap between your training curve and validation curve is extremely wide (and your training curve is very low on the graph), you're overfitting your model

#

Here's a simpler article:

https://rmartinshort.jimdofree.com/2019/02/17/overfitting-bias-variance-and-leaning-curves/

rmartinshort

Overfitting, bias-variance and learning curves

We explore what it means for a machine learning model to generalize well. These concepts are important to keep in mind when thinking about all sorts of supervised machine learning problems.

slate yacht Apr 9, 2020, 3:13 PM

#

Greetings Everyone, I am looking for someone who would like to collaborate on a project with me. I myself, am a very novice programmer, But I have experience in an industry that has given me an idea that could revolutionize said industry. If you are an experienced programmer, with knowledge in data science, feel free to DM me, and we can discuss details

woven tundra Apr 9, 2020, 3:33 PM

#

If you don't mind revealing it publicly, what's the industry? You can be broad if you'd like to keep it confidential

placid gate Apr 9, 2020, 4:12 PM

#

@slate yacht ^

slate yacht Apr 9, 2020, 4:26 PM

#

Auto Transport Brokering

#

Very Simplistic Idea, I just know it will be viable, because it currently doesnt exist in the industry. And if it did exist, It would increase the quality of the service to customers, as well as pay for truck drivers

#

It would almost allow a monopoly in the industry, while improving the overall quality

bronze cipher Apr 9, 2020, 4:29 PM

#

Do we have to be experienced

#

I just want to join for the learning experience

slate yacht Apr 9, 2020, 4:31 PM

#

At least for now, I need to be able to ask questions to an experienced Data Science Person (Python) to see how difficult certain caluclations would be in relation to accuracy

#

I have a question, that if someone can answer (without googling or researching), and answer it truthfully, It will let them know that they are indeed qualified for the position that I am looking to fill. Here it is....

#

Do you know the name of the Algorithm, that is able to tell you what the shortest path will be going from one destination to another on Road Systems. Do you know how it works? And do you understand the math behind it?

bronze cipher Apr 9, 2020, 4:46 PM

#

KNN - K Nearest Neighbour (or however spell it).

#

But that depends if you trying to end up at the same point you started then thats chinese postman

#

Or are you just trying to get to all nodes/points the most effective way, once

#

Chinese postman is what I think is best for the context you've given me

#

I may be wrong so anyone can correct me

#

Uh yeah kind cut you off, were you going to say something?

slate yacht Apr 9, 2020, 4:52 PM

#

Are you able to double layer that formula, so that not only do you want to find the shortest distance, but you also want to find the route, that for each stopping point(city's or towns, for example) the sum of all temperatures was the lowest. So If you wanted to go from point A-B or point A-C, in which the temperature of A was 50 degrees, B == 60 degrees and c == 40 degrees, you chose the shortest route, that was shooting for the highest temperature, so that if A-B and A-C was both 10 meters, you would still want to go A-B because the sum of the temperatures was higher.

#

and layer that with even more variables if needed

#

hopefully i explained that right, its hard to fully gather in my mind to explain in words

bronze cipher Apr 9, 2020, 4:54 PM

#

I mean I understand the idea, but doing it in python is something else

#

It looks possible but my skill level is not that high

#

I can understand where you coming from but I can't do something like that

slate yacht Apr 9, 2020, 4:55 PM

#

Ok no worries, if anyone else ends up reading this, and they think this is something they would be interested in discussing, feel free to DM me or message me on instagram @haulerchase

uncut shadow Apr 9, 2020, 5:26 PM

#

well

bronze cipher Apr 9, 2020, 5:27 PM

#

Am I right with the algorithm? It's been a while since I've done Graph theory

spark stag Apr 9, 2020, 5:47 PM

#

dijkstra's path finding algorithm? (had to look up spelling but knew name), i have an implementation that can find path as well as weight of a journey

sharp raven Apr 9, 2020, 6:58 PM

#

I asked this question in the general channel - but they recommended to ask here: "Hey guys.... I need to build a dashboard which can be distributed independent of a server which hosts it... In the past I have made charts using Bokeh which I could distribute as a single HTML file.. I am considering going this route again but also love what Dash can bring.. I have two questions - is Bokeh capable able of developing larger single HTML file dashboards without too much speed impact? And as far as I can see it's not possible to generate a single HTML file dashboard with Dash or did any of you succeed in this?"

uncut shadow Apr 9, 2020, 8:05 PM

#

Idk who did tell you to ask this question here

#

but no, it's not a correct channel lol

#

also, I haven't used Bokeh in my life

worldly ruin Apr 9, 2020, 10:53 PM

#

any idea why my search only works when the string's first character is capital

#

table = df1.loc[df1['itemtype'] == arg]

#

in the data frame, the thing im looking for is 'leather', so when I assign arg to 'leather' it returns no results

#

but if I change 'leather' to 'Leather' in my dataframe and assign arg to 'Leather', it finds all the values

mild topaz Apr 10, 2020, 8:26 AM

#

hey any1 familiar with tensorflow ?

lapis sequoia Apr 10, 2020, 8:39 AM

#

i, myself am not. but go ahead and ask your question

#

dont just ask "can i ask this" or "anyone good with -----". just straight-up ask what you need

mild topaz Apr 10, 2020, 9:16 AM

#

I am building a model of image detection using tensorflow. I need to know which layers are suitable also which optimizer is used? I am using passport and driving liscence images for it

spark stag Apr 10, 2020, 9:53 AM

#

@mild topaz i'm not hugely experienced but or image regognition yo will want a convolutional network so if its a 2D coloured image you would want to staret with Conv2D, and some pooling layers in there seperating them every now and again, in terms of optimizer, just play around with it. my most recent network uses RMSprop and trains to about 90% accuracy on basic image classification but just try different combinations to see what works

#

also the tensorflow docs have good examples of models they have made for similar tasks so you can look there for ideas / tips

mild topaz Apr 10, 2020, 9:57 AM

#

hey @spark stag hi i am having passport images for train

spark stag Apr 10, 2020, 9:59 AM

#

so by image detection do you mean more like face id?

mild topaz Apr 10, 2020, 9:59 AM

#

see i will explain u my project

#

can i dm u?

spark stag Apr 10, 2020, 10:00 AM

#

yh

echo tendon Apr 10, 2020, 1:29 PM

#

hey guys, i'm totally new in this field, maybe someone can help me. how do i specifically count all values between the two birth years?

#

📎 unknown.png

#

1998<= Year >=1989 <-according to this

bronze cipher Apr 10, 2020, 1:32 PM

#

Wait that doesn't make sense

#

Explain in words

#

You want all the years between 1998 and 1989?

#

and count how many

#

yes?

echo tendon Apr 10, 2020, 1:32 PM

#

these are dates of a festival and i want to check how many people have visited the festival between these birthdays

#

the column ist YearsOfBirth

#

10k visitors(lines)

#

as you can see above are about 2,5k younger than 21

#

I just don't know how to phrase it correctly to search between births.

bronze cipher Apr 10, 2020, 1:35 PM

#

So you trying to count how many people are under the age of 21

#

And between the age of 21 and 30

echo tendon Apr 10, 2020, 1:35 PM

#

yes

#

under 21 I have already solved

bronze cipher Apr 10, 2020, 1:35 PM

#

Oh okay

echo tendon Apr 10, 2020, 1:36 PM

#

as you can see in the first line

bronze cipher Apr 10, 2020, 1:36 PM

#

Yeah I see it now

lapis sequoia Apr 10, 2020, 1:37 PM

#

find out the difference

#

for i in range of difference

#

num = num + i

#

append list

echo tendon Apr 10, 2020, 1:38 PM

#

then it counts the lines of all the birth years in between?

lapis sequoia Apr 10, 2020, 1:38 PM

#

i dont know what you're referring to with lines

#

but that will give you the int for every number between a

#

and b

echo tendon Apr 10, 2020, 1:39 PM

#

i want to count all lines in which these birth years between 1998 and 1989 are registered

lapis sequoia Apr 10, 2020, 1:39 PM

#

for example 100 to 200 is a 100 difference so for i in that difference append year+i aka 100+i to this list

#

what do you mean by lines

echo tendon Apr 10, 2020, 1:39 PM

#

one moment

#

📎 unknown.png

#

you know what I mean? 😄

lapis sequoia Apr 10, 2020, 1:41 PM

#

ok so

#

the list would store

#

all numbers between A and B

#

if you wanted to countl ines inbetween

#

for example 1996

#

would need to have a dict to understand what that value means

#

idk how you did it but if its just a lookup for that year

#

then it should just work

bronze cipher Apr 10, 2020, 1:45 PM

#

birthYear = user_data['YearOfBirth']
betw_21_and_30 = []
for i in range(len(user_data)):
  if birthYear <= 1998 and birthYear>= 1989:
    betw_21_and_30.append()
num = len(betw_21_and_30)

#

Idk try this

echo tendon Apr 10, 2020, 1:46 PM

#

Okay, thanks, guys. I'll try.

bronze cipher Apr 10, 2020, 1:46 PM

#

Lemme know if it works

echo tendon Apr 10, 2020, 1:46 PM

#

kk thanks

#

📎 unknown.png

bronze cipher Apr 10, 2020, 1:48 PM

#

Okay I changed it again

echo tendon Apr 10, 2020, 1:48 PM

#

📎 unknown.png

#

me too

#

😄

bronze cipher Apr 10, 2020, 1:51 PM

#

Okay I made it easier

#

I need to brush on my list comprehension skills

#

Try now

echo tendon Apr 10, 2020, 1:51 PM

#

📎 unknown.png

#

I need to brush on my list comprehension skillshaha

#

haha

#

thanks for helping me and torturing yourself 😄 I appreciate it 😄

bronze cipher Apr 10, 2020, 1:55 PM

#

It's fine

#

Wait is it a pandas array

echo tendon Apr 10, 2020, 2:00 PM

#

it's just a simple excel file or what do you mean exactly? ^^ I am a real beginner sorry

bronze cipher Apr 10, 2020, 2:11 PM

#

Im asking how did you import the data

#

Did you use pandas or cv2

echo tendon Apr 10, 2020, 2:16 PM

#

pandas

#

says my project partner 😄

bronze cipher Apr 10, 2020, 2:20 PM

#

I thin I got it...

#

betw_21_and_30 = []

for i in range(len(user_data)):
  birthYear = [user_data['YearOfBirth'][i]]
  
if birthYear <= 1998 and birthYear>= 1989:
    betw_21_and_30.append()
num = len(betw_21_and_30)```

#

Hope it works 🤞

echo tendon Apr 10, 2020, 2:22 PM

#

📎 unknown.png

#

😄

bronze cipher Apr 10, 2020, 2:23 PM

#

😔

#

for i in range(len(user_data)):
  if user_data['YearOfBirth'][i] <= 1998 and user_data['YearOfBirth'][i] >=1989:
    betw_21_and_30.append()
  num = len(betw_21_and_30)```

#

If this doesn't work then I give up

echo tendon Apr 10, 2020, 2:27 PM

#

📎 unknown.png

#

thank you anyway for your help!

#

😄

bronze cipher Apr 10, 2020, 2:27 PM

#

Just indentation

#

Thats an easy fix

#

There I fixed it

#

There's hope

echo tendon Apr 10, 2020, 2:28 PM

#

📎 unknown.png

bronze cipher Apr 10, 2020, 2:29 PM

#

It's just indentation errors

#

Okay last chance

#

I fixed it

echo tendon Apr 10, 2020, 2:30 PM

#

📎 unknown.png

#

😦

#

😄

bronze cipher Apr 10, 2020, 2:30 PM

#

Ay there it's solved

#

Just need to:

#

Change:

betw_21_and_30.append()

to:

betw_21_and_30.append(user_data['YearOfBirth'][i])

echo tendon Apr 10, 2020, 2:32 PM

#

📎 unknown.png

#

HELL YEAH!

#

😄

bronze cipher Apr 10, 2020, 2:32 PM

#

AYYYYYYY

#

Lets goooo 😆

echo tendon Apr 10, 2020, 2:32 PM

#

thank you! you're the best!

#

xD

bronze cipher Apr 10, 2020, 2:32 PM

#

Thanks

echo tendon Apr 10, 2020, 2:32 PM

#

nice haha

#

can i give you 5€ or something for a coffee?

#

😄

#

haha

bronze cipher Apr 10, 2020, 2:33 PM

#

No no it's all free

#

If that didn't work then I was going to switch to C++ 😂

#

I mean it's not the best way to write it but it works

nimble elm Apr 10, 2020, 2:34 PM

#

@bronze cipher is it ok if I pm you ?

bronze cipher Apr 10, 2020, 2:34 PM

#

Uhhh okay

echo tendon Apr 10, 2020, 2:35 PM

#

thank you so much!

#

see ya 😄

bronze cipher Apr 10, 2020, 2:35 PM

#

See ya 😉

nimble elm Apr 10, 2020, 2:46 PM

#

Hey I was wondering if anyone could assist me with an explanation of the math side of a classification algorithm already built? It's for school and I'm struggling alot with it, thanks

echo tendon Apr 10, 2020, 2:52 PM

#

my partner just found another way, if you're interested 😄 @bronze cipher

📎 unknown.png

bronze cipher Apr 10, 2020, 3:12 PM

#

Oh looks so much easier 😔

echo tendon Apr 10, 2020, 3:13 PM

#

but your lines are more spectacular I'll use them 😄

#

hehe

bronze cipher Apr 10, 2020, 3:32 PM

#

You don't have to 😂

hollow quartz Apr 10, 2020, 3:37 PM

#

Hi, how do i keep an only label. All these labels have the same but with differents values. I use pandas.

📎 Capture.PNG

bronze cipher Apr 10, 2020, 3:43 PM

#

Change the first line of your data file

hollow quartz Apr 10, 2020, 3:50 PM

#

why?

#

the labels columns have the same values

bronze cipher Apr 10, 2020, 3:51 PM

#

So change them

#

Do you understand?

#

The first line of a data file is always the column headings

river bough Apr 10, 2020, 3:57 PM

#

"All these labels have the same but with different values" what do you mean?

#

Btw using df.loc (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html) you can just access the dataframe using integers instead of the column name

hollow quartz Apr 10, 2020, 4:06 PM

#

hum I resolve the problem

#

I change the index and i concate

📎 Capture.PNG

sturdy trench Apr 10, 2020, 6:03 PM

#

hey everyone. does anyone know of any discord/slack servers dedicated to data engineering? very curious about new orchestration tools like dagster/prefect and to some extent DBT

placid gate Apr 10, 2020, 7:13 PM

#

^ i would like to know as well

worldly ruin Apr 10, 2020, 9:50 PM

#

Any idea why I might be having trouble with .loc fetching a certain string from my dataframes?

fresh walrus Apr 10, 2020, 9:52 PM

#

is there an error?

worldly ruin Apr 10, 2020, 9:52 PM

#

no, it just doesn't return any results

#

if the value is saved as 'leather' in my df and I use .loc to search for 'leather', it returns nothing

#

but if I change 'leather' to 'Leather' in my df and use .loc to search for 'Leather' it will find everything

fresh walrus Apr 10, 2020, 9:53 PM

#

you're trying to search for values in the dataframe that are leather?

worldly ruin Apr 10, 2020, 9:54 PM

#

yes

fresh walrus Apr 10, 2020, 9:54 PM

#

"Access a group of rows and columns by label(s) "

worldly ruin Apr 10, 2020, 9:54 PM

#

it works for every other variable I have in that column

fresh walrus Apr 10, 2020, 9:54 PM

#

that's what loc does

worldly ruin Apr 10, 2020, 9:54 PM

#

but leather

#

            result = df1.loc[df1['itemtype'] == arg]
            await message.channel.send('`' + tabulate(result, headers='keys', tablefmt='simple') + '`')```

#

that is the code block I am using for every non exception in my discord bot

#

and it works for every other variable (plate, mail, cloth, accessories)

#

it just doesn't work for leather unless its saved as 'Leather'

fresh walrus Apr 10, 2020, 10:05 PM

#

i imagine there's probably a better solution but can you just make everything lowercase?

worldly ruin Apr 10, 2020, 10:06 PM

#

thats the problem, the whole column is lowercase but for some reason 'leather' cant be found

#

only 'Leather'

fresh walrus Apr 10, 2020, 10:06 PM

#

oh sorry

worldly ruin Apr 10, 2020, 10:06 PM

#

im kind of baffled tbh

fresh walrus Apr 10, 2020, 10:07 PM

#

oh what are you passing as arg

worldly ruin Apr 10, 2020, 10:08 PM

#

whatever the user types in discord after !loot

#

--  ----------------------------------  ----------  ----------  -------  -------  ----------------------------
32  Corpuscular Leather Greaves         leather     feet        crit     mastery  Carapace of N'Zoth
33  Cord of Anguished Cries             Leather     waist       haste    mastery  Dark Inquisitor Xanesh
34  Gloves of Abyssal Authority         leather     hands       haste    mastery  Drest'agath
35  Spaulders of Aberrant Allure        leather     shoulders   azerite           Il'gynoth, Corruption Reborn
36  Belt of Braided Vessels             Leather     waist       haste    vers     Il'gynoth, Corruption Reborn
37  Stygian Guise                       leather     head        azerite           Maut
38  Boots of Manifest Shadow            leather     feet        haste    mastery  Maut
39  Pauldrons of the Great Convergence  leather     shoulders   azerite           N'Zoth the Corruptor
40  Bracers of Dark Prophecy            leather     wrists      crit     haste    Prophet Skitra
41  Macabre Ritual Pants                leather     legs        crit     vers     Prophet Skitra
42  Gibbering Maw                       leather     head        azerite           Ra-den the Despoiled
43  Wristwraps of Volatile Power        leather     wrists      haste    mastery  Shad'har the Insatiable
44  Chitinspine Gloves                  leather     hands       vers     mastery  The Hivemind
45  Darkheart Robe                      leather     chest       azerite           Vexiona
46  Onyx-Imbued Breeches                leather     legs        vers     mastery  Wrathion, the Black Emperor```

#

that is the leather portion of the dataframe, those 2 are capitalized on purpose

#

!loot leather returns:

#

----------  ----------  ----------  -------  -------  --------```

#

!loot Leather returns:

#

--  -----------------------  ----------  ----------  -------  -------  ----------------------------
33  Cord of Anguished Cries  Leather     waist       haste    mastery  Dark Inquisitor Xanesh
36  Belt of Braided Vessels  Leather     waist       haste    vers     Il'gynoth, Corruption Reborn```

iron hornet Apr 10, 2020, 10:32 PM

#

what are some possible explanations to gaussian NB having higher accuracy score than KNN

daring locust Apr 11, 2020, 8:27 AM

#

📎 Capture.PNG

#

Can someone help me with this? Why am I getting this error?

ancient marsh Apr 11, 2020, 8:31 AM

#

yeah you probably didn't load the csv right

#

use the delimiter ","

bronze cipher Apr 11, 2020, 8:32 AM

#

Just change the sepeartor

ancient marsh Apr 11, 2020, 8:32 AM

#

@daring locust https://shanelynnwebsite-mid9n9g1q9y8tt.netdna-ssl.com/wp-content/uploads/2018/07/Other-delimiters-Text-file-e1530995690282.png

bronze cipher Apr 11, 2020, 8:33 AM

#

df= pd.read_csv('path/to/file', sep=',')

#

@daring locust

daring locust Apr 11, 2020, 8:35 AM

#

owshit that was a silly mistake

#

I am really new to this, thank you so much guys 🙂

bronze cipher Apr 11, 2020, 8:35 AM

#

No problem

daring locust Apr 11, 2020, 9:12 AM

#

📎 Capture.PNG

#

can some one tell me how to remove these unnamed columns?

bronze cipher Apr 11, 2020, 9:13 AM

#

Name the columns

#

On the first line of your data file

#

Just add what they represent

#

What do they represent though? 🤔

daring locust Apr 11, 2020, 9:15 AM

#

nothing, in my excel and csv file there is nothing in those columns 🤭

ancient marsh Apr 11, 2020, 9:25 AM

#

@daring locust can u manually drop them?

#

like ```py
df.drop('Unnamed: 10', axis=1)

#

or you could do:

for i in range(0,100):
    df.drop("Unnamed: {}".format(i), axis=1)

daring locust Apr 11, 2020, 9:31 AM

#

yes worked, thank you 🙂

daring locust Apr 11, 2020, 10:14 AM

#

📎 Capture.PNG

#

how do I plot this? There is no numerical value

#

any ideas?

bronze cipher Apr 11, 2020, 10:14 AM

#

You can convert it into numbers

#

So how many statuses are there

daring locust Apr 11, 2020, 10:15 AM

#

Only 2

bronze cipher Apr 11, 2020, 10:16 AM

#

So if you want you can edit the dataset

#

maybe make recovered 0 and hospitalized 1

daring locust Apr 11, 2020, 10:17 AM

#

alright, just changing them in excel will be easier right?

bronze cipher Apr 11, 2020, 10:17 AM

#

Yeah sure but there's some way in pandas that's easier

#

I'm not sure how tbh

daring locust Apr 11, 2020, 10:17 AM

#

alright I will google

#

thanks though 🙂

bronze cipher Apr 11, 2020, 10:17 AM

#

No problem

daring locust Apr 11, 2020, 10:37 AM

#

@bronze cipher here,

📎 Capture.PNG

bronze cipher Apr 11, 2020, 10:37 AM

#

Cool

#

Like I said I don't know how it's done 😅

daring locust Apr 11, 2020, 10:38 AM

#

yeah this did the job
I thought I will just let you know

bronze cipher Apr 11, 2020, 10:39 AM

#

Oh ok

#

Well ty

serene crane Apr 11, 2020, 11:06 AM

#

Super general question, and I don't at all want to start a holy war, just kind of asking, but why does it seem like exploratory data science overwhelmingly uses a tool like Jupyter Notebooks instead of something more like RStudio and MATLAB, e.g. Spyder? I get that the Notebooks are wonderful for telling an interactive story with data and sharing that story, but aren't they a bit weird for actually doing work in?

daring locust Apr 11, 2020, 11:15 AM

#

I thought so but when you use something like Jupyter Notebook, you can immediately see the result right after every line and that helps me a lot
and it is lite and is handy at the same time. I am really new to this and this is just my observation
Using Matlab and Octave can be daunting and you can use so many libraries when using something like python @serene crane

#

You can integrate your python code with anything basically

#

and working with other languages and databases like SQL and all seems easier in python

#

📎 download_4.png

#

is there a way to term anything less than say 2% as "other"?

lament cargo Apr 11, 2020, 12:25 PM

#

@daring locust i dont know the syntax but maybe one way to tackle this

Hide labels/values for items <2%
Create label that just says 2% or less and place it where you want

daring locust Apr 11, 2020, 12:28 PM

#

yeah I do not know the syntax too and I cannot find it anywhere

#

searched it a lot

#

still googling

echo tendon Apr 11, 2020, 2:58 PM

#

hey guys, does anyone know how I can search for a specific item in this column "ItemName"?

#

because now I have the mean value of "ITemEffectiveTotalCredits" for each item

#

package pandas btw ^^

#

📎 unknown.png

lament cargo Apr 11, 2020, 3:52 PM

#

how do you want to search it?

#

you could say something like

#

transact_data['ItemName'] == 'Insert Item Name Here'

#

or do the super special

#

transat_data[ transact_data['ItemName'] == 'Insert Item Name Here']

#

@echo tendon

echo tendon Apr 11, 2020, 4:01 PM

#

thank you
but it should calculate the average value of a certain item from the column "ItemEffecitiveTotalCredits @lament cargo

#

average of the values (second column) in relation to the item from the first column.

#

I hope I can convey it clearly 😄

jolly briar Apr 11, 2020, 4:28 PM

#

@daring locust you can use replace for those kinda subs

#

here's an example:

In [6]: s = '{"PassengerId":{"0":1,"1":2,"2":3,"3":4},"Survived":{"0":0,"1":1,"2":1,"3":1},"Pclass":{"0":3,"1":1,"2":3,"3":1},"Name":{"0":"Brau
   ...: nd, Mr. Owen Harris","1":"Cumings, Mrs. John Bradley (Florence Briggs Thayer)","2":"Heikkinen, Miss. Laina","3":"Futrelle, Mrs. Jacques
   ...:  Heath (Lily May Peel)"}}'

In [7]: df = pd.read_json(s)

In [8]: df
Out[8]:
   PassengerId  Survived  Pclass                                               Name
0            1         0       3                            Braund, Mr. Owen Harris
1            2         1       1  Cumings, Mrs. John Bradley (Florence Briggs Th...
2            3         1       3                             Heikkinen, Miss. Laina
3            4         1       1       Futrelle, Mrs. Jacques Heath (Lily May Peel)

In [9]: df.replace({'Futrelle, Mrs. Jacques Heath (Lily May Peel)': 'something'})
Out[9]:
   PassengerId  Survived  Pclass                                               Name
0            1         0       3                            Braund, Mr. Owen Harris
1            2         1       1  Cumings, Mrs. John Bradley (Florence Briggs Th...
2            3         1       3                             Heikkinen, Miss. Laina
3            4         1       1                                          something

In [10]:

#

@daring locust also, the following can be useful for dropping columns etc:

In [19]: df = pd.DataFrame(dict( drop1 = [1,2], drop2 = [3,4], keep1 = [3,3], keep2=[2,9]))

In [20]: df.loc[ : , ~df.columns.str.contains('drop')]
Out[20]:
   keep1  keep2
0      3      2
1      3      9

lament cargo Apr 11, 2020, 4:51 PM

#

@echo tendon oh i think i understand now, did yo ufigure it out yet?

echo tendon Apr 11, 2020, 4:53 PM

#

no :/ ^^

lament cargo Apr 11, 2020, 4:54 PM

#

so same code

#

but

#

transat_data[ transact_data['ItemName'] == 'Insert Item Name Here']['ItemEffecitiveTotalCredits'].mean()

echo tendon Apr 11, 2020, 4:55 PM

#

📎 unknown.png

#

😄

lament cargo Apr 11, 2020, 4:56 PM

#

lol re run your code

echo tendon Apr 11, 2020, 4:56 PM

#

kk item name here

#

lul

lament cargo Apr 11, 2020, 4:56 PM

#

so that it is a dataframe

echo tendon Apr 11, 2020, 4:56 PM

#

😂

lament cargo Apr 11, 2020, 4:56 PM

#

well that too haha

#

i odnt know what youre looking for exactly

echo tendon Apr 11, 2020, 5:14 PM

#

📎 unknown.png

#

😄

#

📎 unknown.png

daring locust Apr 11, 2020, 5:16 PM

#

@jolly briar ty so much 🙂

#

I created an array from the valuecount() series and removed the values under 2% using a lambda expression

jolly briar Apr 11, 2020, 5:18 PM

#

@daring locust not sure what the data is, there's probably an easier way than that

#

if it works for now then all good tho

daring locust Apr 11, 2020, 5:18 PM

#

yes, there must be a better way to do it

#

but ty so much, I learned a new way of doing it

#

thanks

jolly briar Apr 11, 2020, 5:21 PM

#

@daring locust np, post a sample of the data in future and it'll be easier to see what works best

daring locust Apr 11, 2020, 5:34 PM

#

alright

#

is there a way to export jupyter notebook files to pdf