velvet thorn Sep 21, 2020, 11:36 PM

#

assume one row is of shape (5,)

#

options would be of shape, say, (5, 13)

#

you can use broadcasting there

#

and then take .min()

serene scaffold Sep 21, 2020, 11:37 PM

#

broadcasting?

velvet thorn Sep 21, 2020, 11:37 PM

#

hold up, though

#

why are you using mean() instead of sum()

#

I see it doesn't matter in this case

#

just curious

serene scaffold Sep 21, 2020, 11:38 PM

#

maybe I should use sum

#

I can check the assignment again, but I won't ask you about that to keep my question focused on numpy.

#

(to eliminate the pretense that you're doing my homework)

#

I think it needs to be mean though because for the manhattan distance, in this case, we ignore columns where either has a nan

#

so rows with fewer nans get a much higher chance to be closer.

velvet thorn Sep 21, 2020, 11:41 PM

#

I think it needs to be mean though because for the manhattan distance, in this case, we ignore columns where either has a nan
if that's your intention, then fair enough

#

so rows with fewer nans get a much higher chance to be closer.
@serene scaffold why do you say this, though?

#

because assuming all columns are similarly distributed, the means would also have similar distributions, regardless of the number of non-null values in each row

#

if I understand you correctly

serene scaffold Sep 21, 2020, 11:42 PM

#

I think the dataset is fake so I don't know if the distribution of nans is meaningful.

velvet thorn Sep 21, 2020, 11:42 PM

#

maybe I should use sum
@serene scaffold the thing is I'm not sure the question "what is the Manhattan distance between two partially defined points" has a well-defined answer

#

to use a practical example...say you are at (0, 0) on a map. which is closer to you, (3, ?), or (1, 4)?

#

of course, this is more a question of statistical theory than programming

serene scaffold Sep 21, 2020, 11:44 PM

#

right

velvet thorn Sep 21, 2020, 11:44 PM

#

and I don't claim to know the intentions of whomever set you the homework

#

just thinking out loud

#

anyway, back to your question

serene scaffold Sep 21, 2020, 11:44 PM

#

I can

#

this homework assignment is a trick to show how long data science programs take

#

a lot of people were emailing the TA saying that their code was running overnight

#

and he finally said "that's the point"

velvet thorn Sep 21, 2020, 11:45 PM

#

>>> a = np.random.chisquare(5, size=(5, 3))
>>> a.shape
(5, 3)
>>> a.mean(axis=0).shape
(3,)
>>> a - a.mean(axis=0)
array([[-1.60512055, -1.8917574 , -0.87419636],
       [ 2.46631863,  0.17530674, -0.26093926],
       [ 4.48044859, -0.56073539, -2.20589024],
       [-2.02807185,  0.35415127,  0.15569975],
       [-3.31357484,  1.92303477,  3.18532612]])

#

this is broadcasting

#

note the shapes of the arrays

serene scaffold Sep 21, 2020, 11:46 PM

#

!e

import numpy as np
print(np.random.chisquare(5, size=(5, 3)))

arctic wedgeBOT Sep 21, 2020, 11:46 PM

#

@serene scaffold :white_check_mark: Your eval job has completed with return code 0.

001 | [[ 3.17295805  5.87307768  5.60451568]
002 |  [10.90956275  2.15776996  9.61003041]
003 |  [ 4.06197492  5.80642425  6.10925965]
004 |  [ 3.45856202  1.74811126 10.31556199]
005 |  [ 7.84383268  5.67000452  4.59208444]]

velvet thorn Sep 21, 2020, 11:46 PM

#

the smaller array gets "copied" across the axes

serene scaffold Sep 21, 2020, 11:46 PM

#

I don't know that I understand what math is happening here

velvet thorn Sep 21, 2020, 11:46 PM

#

okay, so first you have an array of random numbers, right?

serene scaffold Sep 21, 2020, 11:47 PM

#

I'm not sure what the lone 5 is for but I see this has a shape of 5 by 3

velvet thorn Sep 21, 2020, 11:47 PM

#

where each row is one sample, and each column is one class of observations.

serene scaffold Sep 21, 2020, 11:47 PM

#

and I'll accept that it's random

velvet thorn Sep 21, 2020, 11:47 PM

#

so for example, one row could be an individual person's data, and one column could be all the, say, heights of the people in the dataset.

#

now, say we want to take the mean of each column; that would be a.mean(axis=0)

#

which would give us an array of shape (3,) (one mean per column)

serene scaffold Sep 21, 2020, 11:48 PM

#

right

velvet thorn Sep 21, 2020, 11:48 PM

#

next, we want to subtract that mean from each value in the original dataset, matching columns

#

so subtract the first mean from every value in the first column, second mean from every value in the second column, etc.

rustic apex Sep 21, 2020, 11:49 PM

#

What parts of Numpy should you concentrate on? Also, what’s the difference between Numpy and Pandas? It seems like allot ends up relying on pandas, so, should you use/learn both at once?

velvet thorn Sep 21, 2020, 11:49 PM

#

What parts of Numpy should you concentrate on? Also, what’s the difference between Numpy and Pandas? It seems like allot ends up relying on pandas, so, should you use/learn both at once?
@rustic apex pandas is built on numpy.

#

it depends on what you want to do.

#

pandas is more for higher-level data wrangling.

serene scaffold Sep 21, 2020, 11:49 PM

#

@rustic apex numpy is specifically for math and especially linear algebra, pandas is more for tabular data in general.

velvet thorn Sep 21, 2020, 11:50 PM

#

in particular, data that comes in 2D form.

#

numpy allows you to work in higher dimensions and perform lower-level operations that often don't make sense on tabular data, such as taking the outer product, matrix product, applying functions across strides, etc.

#

so subtract the first mean from every value in the first column, second mean from every value in the second column, etc.
@velvet thorn but remember your original array has shape (5, 3), and your means have shape (3,)

#

which means that a - a.mean(axis=0) shouldn't work (shape mismatch) and yet it does.

#

that's because numpy recognises, in a nutshell, that the arrays match along one axis (the last one) and for every other axis, at least one array has either length 1 or is missing that axis.

#

so it implicitly "expands" the array of means to the shape (5, 3) (conceptually) before performing the elementwise subtraction

rustic apex Sep 21, 2020, 11:52 PM

#

Thank you, I’m interested in using CSV and predicting data. I’m just getting started, it’s clicking with me well

velvet thorn Sep 21, 2020, 11:52 PM

#

the details can be found here https://numpy.org/doc/stable/user/basics.broadcasting.html

#

Thank you, I’m interested in using CSV and predicting data. I’m just getting started, it’s clicking with me well
@rustic apex "using CSV"?

#

what do you mean by that?

rustic apex Sep 21, 2020, 11:52 PM

#

The CSV file?

serene scaffold Sep 21, 2020, 11:52 PM

#

CSV is just comma separated values

rustic apex Sep 21, 2020, 11:53 PM

#

I mean that as in just using data

merry fern Sep 21, 2020, 11:53 PM

#

pandas dataframes...@velvet thorn this is what I got so far with trying to create a new column value based on existing columns, but I can't figure out how to get it to work.

def admin_mapper(df):
    if df['Name'].str.startswith('RP', na=False) or df['Name'].str.startswith('RV', na=False) and df['Price'] == 0:
        return 'Repurchase Agreement'
    elif df['Name'].str.startswith('BUY', na=False) or df['Name'].str.startswith('SELL', na=False):
        return 'CDS'
    elif df['Price'] != "0":
        return 'Bond'
df_admin['Type'] = df_admin.apply(lambda row: admin_mapper(df_admin), axis=1)

velvet thorn Sep 21, 2020, 11:53 PM

#

saying "using CSV" is kind of like "I want to learn driving and I'm excited to use 95 octane petrol"

#

@merry fern yes, I saw your message earlier

#

I'll get to it later.

#

also, you didn't answer my question.

rustic apex Sep 21, 2020, 11:54 PM

#

Haha, yea I mean using data and showing predictions

velvet thorn Sep 21, 2020, 11:54 PM

#

Haha, yea I mean using data and showing predictions
@rustic apex fair enough

merry fern Sep 21, 2020, 11:54 PM

#

thanks, i just updated it. I am not sure where I am going to go with this in terms of other values...

velvet thorn Sep 21, 2020, 11:54 PM

#

I would say focus on pandas, but understand numpy

#

thanks, i just updated it. I am not sure where I am going to go with this in terms of other values...
@merry fern you could return a placeholder or None first.

#

anyway, your code looks more or less right

#

but replace or with |

rustic apex Sep 21, 2020, 11:54 PM

#

How well should you be at what type of math? I taught myself Trig and Pre Cal

serene scaffold Sep 21, 2020, 11:54 PM

#

@velvet thorn I'm not sure I see the application. The manhattan distance formula I'm using ignores indices for which either is nan

merry fern Sep 21, 2020, 11:55 PM

#

ah, and with & ?

#

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

serene scaffold Sep 21, 2020, 11:55 PM

#

so I assume it requires individual calls to manhattan_distance each time

velvet thorn Sep 21, 2020, 11:56 PM

#

so I assume it requires individual calls to manhattan_distance each time
@serene scaffold you could do it that way, I guess?

#

it would be slower and kind of bad practice

#

so I would suggest using vectorisation in combination with np.nanmean instead

#

then you could entirely cut out the inner loop

serene scaffold Sep 21, 2020, 11:57 PM

#

is there a way to use numpy to get some performance advantage in that case?

velvet thorn Sep 21, 2020, 11:57 PM

#

ah, and with & ?
@merry fern oh wait, I misread your code

#

okay, you know one thing you can do?

#

make a subsidiary DataFrame of conditions

#

then play with that.

#

e.g.

#

columns = ['starts_with_rp', 'starts_with_rv', 'starts_with_buy', 'starts_with_sell', 'zero_price']
conditions_df = pd.concat([
    df['Name'].str.startswith('RP', na=False),
    df['Name'].str.startswith('RV', na=False),
    df['Name'].str.startswith('BUY', na=False),
    df['Name'].str.startswith('SELL', na=False),
    df['Price'] == 0,
], axis=1).rename(columns=columns)

#

then you can just use masking to assign

#

e.g. df[~conditions_df['Price']] = 'Bond' would take the place of your last condition

#

do it in reverse order of priority

#

is there a way to use numpy to get some performance advantage in that case?
@serene scaffold yup, vectorisation

merry fern Sep 22, 2020, 12:01 AM

#

yikes

velvet thorn Sep 22, 2020, 12:01 AM

#

this kind of uses concepts that you probably haven't been exposed to, which makes it all seem a bit abstract

#

like broadcasting

#

but the main idea is that broadcasting, in general, takes the place of numpy in loops

#

however, from a distance...

#

can you see how using nanmean might help?

#

yikes
@merry fern do you see what I mean?

merry fern Sep 22, 2020, 12:03 AM

#

im trying to interpret it and understand its use case now

velvet thorn Sep 22, 2020, 12:03 AM

#

okay.

#

so basically

merry fern Sep 22, 2020, 12:03 AM

#

would you then combine conditions_df w/ original df ?

velvet thorn Sep 22, 2020, 12:03 AM

#

that creates a new DataFrame with 5 columns

#

and the same index as the original DataFrame

#

and crucially, because of that, you can use it to index the original DataFrame.

#

how about you run it first

#

and look at the result

merry fern Sep 22, 2020, 12:04 AM

#

why is it important that you are re-indexing the dataframe?

#

ok

velvet thorn Sep 22, 2020, 12:04 AM

#

why is it important that you are re-indexing the dataframe?
@merry fern not re-indexing

#

the point is, for example

#

you can do df[conditions_df['starts_with_rp']] to give you all the rows in the original DF where the Name column starts with 'RP'.

#

do you see the utility of that?

merry fern Sep 22, 2020, 12:05 AM

#

absolutely

serene scaffold Sep 22, 2020, 12:05 AM

#

matrix[~np.isnan(matrix[:, j])]

velvet thorn Sep 22, 2020, 12:05 AM

#

and you can combine that with other columns of conditions_df

serene scaffold Sep 22, 2020, 12:06 AM

#

it doesn't work because the mask is a different shape

merry fern Sep 22, 2020, 12:06 AM

#

essentially i was trying to figure out, how do i play with the dataframe now that ive gathered the data

velvet thorn Sep 22, 2020, 12:06 AM

#

to access the rows that you want

merry fern Sep 22, 2020, 12:06 AM

#

right

#

columns=columns is throwing an error

velvet thorn Sep 22, 2020, 12:07 AM

#

oh, right, that won't work

#

uh

#

columns=dict(enumerate(columns)) should work

merry fern Sep 22, 2020, 12:08 AM

#

pycharm saying:
Unexpected type(s):(enumerate[str])Possible types:(Mapping)(Iterable[Tuple[Any, Any]])

velvet thorn Sep 22, 2020, 12:08 AM

#

matrix[~np.isnan(matrix[:, j])]
@serene scaffold huh. it doesn't...?

serene scaffold Sep 22, 2020, 12:08 AM

#

IndexError: boolean index did not match indexed array along dimension 1; dimension is 10 but corresponding boolean dimension is 1```

velvet thorn Sep 22, 2020, 12:08 AM

#

pycharm saying:
Unexpected type(s):(enumerate[str])Possible types:(Mapping)(Iterable[Tuple[Any, Any]])
@merry fern that shouldn't be a problem

#

IndexError: boolean index did not match indexed array along dimension 1; dimension is 10 but corresponding boolean dimension is 1```

@serene scaffold matrix is 2D, right?

serene scaffold Sep 22, 2020, 12:10 AM

#

let's be sure, I guess

#

yeah it is

velvet thorn Sep 22, 2020, 12:11 AM

#

that...shouldn't be happening

serene scaffold Sep 22, 2020, 12:13 AM

#

(~np.isnan(matrix[:, j])).shape is (8000,1)

merry fern Sep 22, 2020, 12:13 AM

#

hmm. KeyError: 'starts_with_rp'

velvet thorn Sep 22, 2020, 12:13 AM

#

hmm. KeyError: 'starts_with_rp'
@merry fern that means the columns weren't renamed properly

#

just assign conditions_df.columns = columns I guess

#

so much for using rename 🤷‍♂️

#

(~np.isnan(matrix[:, j])).shape is (8000,1)
@serene scaffold that's not right

#

it should be shape (8000,)

serene scaffold Sep 22, 2020, 12:14 AM

#

why

velvet thorn Sep 22, 2020, 12:14 AM

#

because you're indexing on the columns

#

so that dimension should disappear

#

if matrix is 2D, matrix[:, j] should be 1D, assuming j is an int

#

>>> a = np.zeros(shape=(8000, 10))
>>> a[:, 1].shape
(8000,)

merry fern Sep 22, 2020, 12:15 AM

#

@velvet thorn, so your code made a list of rules, then made a new df w/ the conditions, then renamed the columns in the same order as the rules, right/

serene scaffold Sep 22, 2020, 12:15 AM

#

idk what to say

velvet thorn Sep 22, 2020, 12:16 AM

#

@velvet thorn, so your code made a list of rules, then made a new df w/ the conditions, then renamed the columns in the same order as the rules, right/
@merry fern yup

#

idk what to say
@serene scaffold you checked matrix.shape, right?

serene scaffold Sep 22, 2020, 12:16 AM

#

yes

velvet thorn Sep 22, 2020, 12:16 AM

#

okay, last resort

#

what version is your numpy

serene scaffold Sep 22, 2020, 12:18 AM

#

let's see

#

1.19.2

velvet thorn Sep 22, 2020, 12:19 AM

#

okay, I'm stumped

serene scaffold Sep 22, 2020, 12:19 AM

#

😮

#

🤯

velvet thorn Sep 22, 2020, 12:19 AM

#

the best guess I can make is that somewhere matrix is getting an extra axis

#

because that wouldn't make sense otherwise

#

if you slice, one axis disappears

#

this is like the most fundamental thing in numpy

#

actually

#

OH

#

I know why now

#

it's because np.where returns a tuple

#

so for j in... is taking the first element of that tuple, which is an array

#

so j is not in fact an int, which leads to a 2D array upon slicing

#

>>> np.where([1, 0, 1])
(array([0, 2]),)

slender nymph Sep 22, 2020, 12:22 AM

#

I have to find the latest date of the tickers (stock of companies) that have gone bankrupt. I actually want to have all those who no longer publish themselves on the market.

you will see in this table there are a lot of stocks. some no longer publish because they have made fallite . I'd like to have all those who stopped publishing

i know they are 7391 of those who stopped publishing

serene scaffold Sep 22, 2020, 12:22 AM

#

@velvet thorn oh here's something

velvet thorn Sep 22, 2020, 12:22 AM

#

I have to find the latest date of the tickers (stock of companies) that have gone bankrupt. I actually want to have all those who no longer publish themselves on the market.

you will see in this table there are a lot of stocks. some no longer publish because they have made fallite . I'd like to have all those who stopped publishing

i know they are 7391 of those who stopped publishing
@slender nymph define "stopped publishing"

slender nymph Sep 22, 2020, 12:22 AM

#

📎 unknown.png

serene scaffold Sep 22, 2020, 12:22 AM

#

j is an ndarray

#

that's weird

velvet thorn Sep 22, 2020, 12:22 AM

#

yeah

#

that's what I said above actually

serene scaffold Sep 22, 2020, 12:22 AM

#

hmm

velvet thorn Sep 22, 2020, 12:22 AM

#

right before you said that

serene scaffold Sep 22, 2020, 12:23 AM

#

oh, I didn't see it because I have discord in a small window

#

and the other user made a comment

slender nymph Sep 22, 2020, 12:23 AM

#

how can resolve that

#

someone can help me to start

serene scaffold Sep 22, 2020, 12:24 AM

#

well it's running @velvet thorn

velvet thorn Sep 22, 2020, 12:24 AM

#

yeah, we came to the same realisation

#

anyway, that's why there was that problem

serene scaffold Sep 22, 2020, 12:24 AM

#

so I guess we'll see tomorrow if it worked

velvet thorn Sep 22, 2020, 12:25 AM

#

I forgot about np.where

merry fern Sep 22, 2020, 12:28 AM

#

@velvet thorn, i guess i can't do this:

conditions_df = pd.concat([
    df_admin['Name'].str.startswith('RP', na=False) | df_admin['Name'].str.startswith('RV', na=False),
    df_admin['Name'].str.startswith('BUY', na=False) | df_admin['Name'].str.startswith('SELL', na=False),
    df_admin['Price'] != 0,
], axis=1)

multiple conditions in there

velvet thorn Sep 22, 2020, 12:29 AM

#

you could, but the point is to generate the conditions individually

#

and then use them to create another column

#

that is fine too, though

merry fern Sep 22, 2020, 12:30 AM

#

it didnt work... but i see what youre saying

#

bc then i can reference the various characteristics

velvet thorn Sep 22, 2020, 12:31 AM

#

why didn't it work

#

it should work

merry fern Sep 22, 2020, 12:31 AM

#

when i do my mod the files come out empty

velvet thorn Sep 22, 2020, 12:31 AM

#

your conditions are wrong, I think?

merry fern Sep 22, 2020, 12:33 AM

#

df_admin[conditions_df[['starts_with_rp', 'starts_with_rv', 'starts_with_buy', 'starts_with_sell', 'zero_price']]].to_csv(r'csv_filteroutput' + fntime + '.csv')
anything wrong there /

#

i added a timestamp variable so i can keep running it without having to close the files 🙂

#

if i can DM u ill send u files

velvet thorn Sep 22, 2020, 12:37 AM

#

uh...

#

not sure waht you're trying to do there

merry fern Sep 22, 2020, 12:40 AM

#

i was trying to export all the values in that second list

#

but i guess i can just do conditions

velvet thorn Sep 22, 2020, 12:40 AM

#

conditions_df[['starts_with_rp', 'starts_with_rv', 'starts_with_buy', 'starts_with_sell', 'zero_price']].to_csv()?

#

why are you indexing and then saving

merry fern Sep 22, 2020, 12:40 AM

#

yea i have no idea what i was doing there haha

#

i think i left in df_admin accidentally

#

i wanted to see what was in conditions_df, but pycharm limits it to 5 rows, wasn't sure how to print it verbosely

#

@velvet thorn i see now that df_conditions creates a df of bools? interesting. how did you learn python/coding? whats your history

velvet thorn Sep 22, 2020, 12:45 AM

#

uh

#

I went for a coding bootcamp

#

then I worked at a startup for a few months

#

then I worked at another startup for a few months

#

then I worked as a data science instructor for a few months

#

now I'm building my own startup

merry fern Sep 22, 2020, 12:46 AM

#

i guess more what i was asking was when/how did you make the most advancement in your knowledge?

was it all comapny based projects or on your own?

#

whats your startup?

velvet thorn Sep 22, 2020, 12:46 AM

#

by doing stuff

#

think of something cool to build, build it

#

I learn by doing

#

whats your startup?
@merry fern edutech related, basically

merry fern Sep 22, 2020, 12:47 AM

#

me too. thats what im trying to do now, but keep stumbling

#

cool

rustic apex Sep 22, 2020, 12:47 AM

#

What are some topics to focus on while learning Numpy?

merry fern Sep 22, 2020, 12:48 AM

#

so now I need to write a mapper to utilize conditions_df, or how woudl you approach this?

velvet thorn Sep 22, 2020, 12:49 AM

#

What are some topics to focus on while learning Numpy?
@rustic apex don't think of topics.

#

think of things you wanna do

#

(IMO)

#

so now I need to write a mapper to utilize conditions_df, or how woudl you approach this?
@merry fern well.

#

the CLEANEST way

#

would be to create a third DataFrame

#

and join the two

#

that would be my preferred approach.

#

do you understand the concept of "join"?

#

like in the SQL sense

merry fern Sep 22, 2020, 12:50 AM

#

the CLEANEST way
@velvet thorn THIS is where im getting caught up - i keep trying to figure out if what im doing is even efficient or standard

#

yes

velvet thorn Sep 22, 2020, 12:50 AM

#

well

#

you need to have a certain kind of mind

#

to get things intuitively, I guess?

#

but hard work works too

#

gotta understand your strengths and weaknesses

merry fern Sep 22, 2020, 12:50 AM

#

i recently read that iterating is bad, and vectorization is the way to go

velvet thorn Sep 22, 2020, 12:50 AM

#

anyway, you can think of your problem

#

as basically

#

joining on the condition columns.

#

do you understand why?

#

@rustic apex like of course you should understand concepts like vectorisation, broadcasting and indexing...

merry fern Sep 22, 2020, 12:51 AM

#

no i cant really, what do you mean "joining on"

velvet thorn Sep 22, 2020, 12:51 AM

#

...but studying them in isolation is likely to confuse you.

rustic apex Sep 22, 2020, 12:51 AM

#

@velvet thorn well, Pandas makes sense with loading files and how to display certain things, but I’ve just gotten used to arrays with Numpy

velvet thorn Sep 22, 2020, 12:52 AM

#

so it's better to find things that you need to do

#

and learn how to do them with numpy.

#

no i cant really, what do you mean "joining on"
@merry fern okay

#

so

#

conceptually

#

what you want to do

#

okay never mind

#

you understand

#

what a "join" is, right?

#

and you know you join on columns?

merry fern Sep 22, 2020, 12:53 AM

#

im familiar with outer/inner join, but i am a newb @ getting my hands dirty with handling data like this

rustic apex Sep 22, 2020, 12:53 AM

#

I understand it a little

merry fern Sep 22, 2020, 12:58 AM

#

@velvet thorn The problem: I need to go through each row in a database & check 2 columns' values. based on those 2 columns' values, I need to create a new column and give it a value of "1,2,3" - But, yes, you are correct where I think you were going w/ your question before. Ideally, I'd like to be able to digest and act on a variety of data that comes in. I just started with this 3 example because its more than 2 (True/False)

I always just think of for loops to do this but that's not exactly python's way, right?

#

@velvet thorn - I see, both dataframes retained identical indices

velvet thorn Sep 22, 2020, 1:17 AM

#

@merry fern okay, I need to go, but left joins are the solution to your problem

#

you need a third DataFrame to represent the correspondence of the various conditions to your final output value, with, of course, one row per output value (so 3 in this case)

#

then you join that to conditions_df, with conditions_df on the left

#

think about how that would work

merry fern Sep 22, 2020, 1:20 AM

#

think about how that would work
@velvet thorn yes i was just working on that

#

this is the mapping

📎 unknown.png

#

thanks for the help

serene scaffold Sep 22, 2020, 1:35 AM

#

  return _methods._mean(a, axis=axis, dtype=dtype,
C:\development\school\data_science\venv\lib\site-packages\numpy\core\_methods.py:170: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)```

#

this is the whole traceback that I get

#

so I don't even know what in my code is causing this error

#

the line before it is an expected print statement

#

huh, I guess it's just a warning

#

wish it gave me more info though

#

def manhattan_distance(x: np.array, y: np.array) -> np.float:
    not_null = ~np.isnan(x) & ~np.isnan(y) & BOOL_MASK
    x, y = x[not_null], y[not_null]
    return np.mean(np.absolute(x - y))

I'm wondering if using functools.lru_cache will speed this up

#

hashing the two arrays might take too long

#

eh well I guess you can't do that anyway

hasty grail Sep 22, 2020, 1:54 AM

#

I think you can apply the mask after calculating the absolute value, that way you don't have to do it twice

serene scaffold Sep 22, 2020, 1:56 AM

#

@hasty grail I have to apply the mask first, but the problem is that I have it written in such a way that this function will likely get called on the same pair of arrays a couple times

#

though that number is at most ten

velvet thorn Sep 22, 2020, 4:27 AM

#

this is the mapping
@merry fern looks good

#

so I don't even know what in my code is causing this error
@serene scaffold it says "mean of empty slice"

#

and you have only one call to .mean, so...

#

def manhattan_distance(x: np.array, y: np.array) -> np.float:
    not_null = ~np.isnan(x) & ~np.isnan(y) & BOOL_MASK
    x, y = x[not_null], y[not_null]
    return np.mean(np.absolute(x - y))
I'm wondering if using functools.lru_cache will speed this up
@serene scaffold you don't wanna use np.nanmean?

serene scaffold Sep 22, 2020, 4:30 AM

#

@velvet thorn well, I already finished it and got the results, so it's now the TA's problem.

#

I'm not sure what nanmean would afford me in this context?

velvet thorn Sep 22, 2020, 4:31 AM

#

just np.nanmean(np.absolute(x - y)[BOOL_MASK]) (whatever BOOL_MASK is)

#

without masking

serene scaffold Sep 22, 2020, 4:31 AM

#

well I also need to mask the last row every time

#

that's what BOOL_MASK is for

velvet thorn Sep 22, 2020, 4:32 AM

#

okay, so that then

serene scaffold Sep 22, 2020, 4:32 AM

#

though I don't know that that's the bottleneck

#

because the problem is that this function can be called up to nine times for the same x, y

#

it's incredibly unlikely that it would ever reach nine

#

but I have the results

#

so... py_strong

slate hollow Sep 22, 2020, 4:35 AM

#

hey so for tensorflow, in a book i read that you could create a model through the functional api through smth like this: py keras.layers.Dense(30, activation='relu')(input_)but then later in the book they define something like this: py def call(self, inputs): Z = inputs for layer in self.hidden: # hidden is a list of dense layers Z = layer(Z) return inputs + Z

#

so... what's with the inconsistency? (ping plz)

velvet thorn Sep 22, 2020, 4:38 AM

#

though I don't know that that's the bottleneck
@serene scaffold the point is to remove the inner loop, I guess

#

using vectorisation and broadcasting, as discussed earlier

serene scaffold Sep 22, 2020, 4:39 AM

#

I might look into that in the future, though for some reason this is the only programming assignment for this course.

velvet thorn Sep 22, 2020, 4:39 AM

#

@slate hollow what do you mean...?

slate hollow Sep 22, 2020, 4:39 AM

#

i mean in both things they're calling the layer

#

but one supposedly tells keras how to connect the layers, while the other applies the functions to the inputs

velvet thorn Sep 22, 2020, 4:40 AM

#

return inputs + Z this is weird to me

#

not really sure what's up with that

slate hollow Sep 22, 2020, 4:40 AM

#

oh um it's like some hypothetical addition thing

#

nothing wrong

velvet thorn Sep 22, 2020, 4:42 AM

#

okay

#

then I don't really see the inconsistency...?

#

in both cases they're applying layers to layers

slate hollow Sep 22, 2020, 4:56 AM

#

i mean

#

the first one takes another layer

#

the second takes an array or something

velvet thorn Sep 22, 2020, 4:57 AM

#

yeah, it takes a number of layers

#

and it successively applies them

slate hollow Sep 22, 2020, 5:00 AM

#

wait what

#

i thought it just took an array of numbers

velvet thorn Sep 22, 2020, 5:00 AM

#

oh, wait, never mind

#

my bad I read wrongly

#

you're right

#

the for loop is the successive application

#

the addition is the last part

#

yeah, it takes a number of layers
@velvet thorn the layers are in self.layers, whatever that is

slate hollow Sep 22, 2020, 5:02 AM

#

yeah..?

#

1094/1094 [==============================] - 11s 10ms/step - loss: 0.5547 - accuracy: 0.8070 - val_loss: 1.9851 - val_accuracy: 0.4737
pain

limpid oak Sep 22, 2020, 5:29 AM

#

I have coloumn values like '6 months', 'months 12' , any method to extract int values from it

velvet thorn Sep 22, 2020, 5:32 AM

#

I have coloumn values like '6 months', 'months 12' , any method to extract int values from it
@limpid oak df['column_name'].str.extract(r'\d+') should work

limpid oak Sep 22, 2020, 5:33 AM

#

thank you @velvet thorn , let me try

velvet thorn Sep 22, 2020, 5:34 AM

#

note that

#

you still need

#

to convert to int if you want to use it as an int

limpid oak Sep 22, 2020, 5:40 AM

#

@velvet thorn ValueError: pattern contains no capture groups

velvet thorn Sep 22, 2020, 5:40 AM

#

r'(\d+)'

limpid oak Sep 22, 2020, 5:41 AM

#

IntValues = int(df['Months'].str.extract(r'\d+'))

#

throwing same error

odd yoke Sep 22, 2020, 5:49 AM

#

You didn't replace the regex

limpid oak Sep 22, 2020, 5:50 AM

#

?

odd yoke Sep 22, 2020, 5:52 AM

#

r'(\d+)'

limpid oak Sep 22, 2020, 5:58 AM

#

this works perfect IntValues = df['Months'].str.extract(r'(\d+)').astype(int)

#

thank you so much your kind support @velvet thorn and @odd yoke

odd yoke Sep 22, 2020, 5:58 AM

#

How to get recognition for free 101

mild topaz Sep 22, 2020, 6:21 AM

#

np.ndarray.shape
@hasty grail what i can do with this?

hasty grail Sep 22, 2020, 6:24 AM

#

you get the image's height and width

fossil vapor Sep 22, 2020, 6:45 AM

#

Hi everyone, what could be the reasons for kernel to appear dead while running a cell in jupyter notebook? I use a mac book pro, OS = OS X Yosmite vesion 10.10.5, 2.4 GHz intel core 2 duo.

limpid oak Sep 22, 2020, 6:47 AM

#

it might taking too much memory to process

#

you can process your data in batches

fossil vapor Sep 22, 2020, 6:50 AM

#

I tried processing in batches and got the same error message about the kernel

limpid oak Sep 22, 2020, 6:51 AM

#

try it on google colab

fossil vapor Sep 22, 2020, 6:52 AM

#

Will try that, Thanks

limpid oak Sep 22, 2020, 6:52 AM

#

welcome

mild topaz Sep 22, 2020, 7:21 AM

#

Traceback (most recent call last):

File "E:\paymentz\template.py", line 8, in <module>
np.ndarray.shape(img)

TypeError: 'getset_descriptor' object is not callable @hasty grail

hasty grail Sep 22, 2020, 8:37 AM

#

img is a np.ndarray

#

so you just do img.shape

mild topaz Sep 22, 2020, 9:09 AM

#

https://discord.com/channels/@me/757573609433727037/757874627736502342 @hasty grail is this possible in python ?

hasty grail Sep 22, 2020, 9:16 AM

#

your link is not working

mild topaz Sep 22, 2020, 9:24 AM

#

see this @hasty grail

📎 unknown.png

hasty grail Sep 22, 2020, 9:26 AM

#

sure it's possible

mild topaz Sep 22, 2020, 9:26 AM

#

if u find any tutorial please do share , i m also looking for this

#

sure it's possible
@hasty grail ok

hasty grail Sep 22, 2020, 9:27 AM

#

what are your results with OpenCV?

mild topaz Sep 22, 2020, 9:29 AM

#

(array([], dtype=int64), array([], dtype=int64))

hasty grail Sep 22, 2020, 9:30 AM

#

do you know what that means?

mild topaz Sep 22, 2020, 9:30 AM

#

actually i am doing different task now

#

see this @hasty grail
template matching

hasty grail Sep 22, 2020, 9:31 AM

#

what are your results with OpenCV?

#

Weren't you trying to do template matching with it?

mild topaz Sep 22, 2020, 9:32 AM

#

yes i was

hasty grail Sep 22, 2020, 9:33 AM

#

and....

mild topaz Sep 22, 2020, 9:37 AM

#

my team mate told me to change the task for now

hasty grail Sep 22, 2020, 9:39 AM

#

ok...

mild topaz Sep 22, 2020, 9:40 AM

#

yes

#

if u find any tutorial pls do share

hasty grail Sep 22, 2020, 10:08 AM

#

Well idk what are you doing now

mild topaz Sep 22, 2020, 10:13 AM

#

can i explain u my problem ?

#

i want to make a template

#

of documents

hasty grail Sep 22, 2020, 10:24 AM

#

Just explain it here

mild topaz Sep 22, 2020, 10:25 AM

#

#data-science-and-ml message

#

i want to make similar to this @hasty grail

hasty grail Sep 22, 2020, 10:26 AM

#

But that image shows template matching as well

#

How is it any different?

mild topaz Sep 22, 2020, 10:26 AM

#

which image

hasty grail Sep 22, 2020, 10:27 AM

#

The one you just linked

mild topaz Sep 22, 2020, 10:27 AM

#

i know ,

#

but i need a tutorial

mild topaz Sep 22, 2020, 10:46 AM

#

plese do share if you find any

tidal sonnet Sep 22, 2020, 11:19 AM

#

I understand the first one... But why when theta1 = 0.5, that the point must go thru the 2,1 mark?

📎 unknown.png

#

this is uhm...
Linear regression with one variable

smoky kite Sep 22, 2020, 11:40 AM

#

@tidal sonnet Because if x=2, h(2) = 0.5 x 2 which is 1

hasty grail Sep 22, 2020, 11:40 AM

#

Did you mix up x and y?

#

@mild topaz Sorry, I don't have experience in that specific area of OpenCV

tidal sonnet Sep 22, 2020, 11:46 AM

#

POG!

#

Wait... but how do we know that x is 2 in that case?

#

I was not given a value for x... just told that if theta1 was 0.5, and theta0 was 0, that the line would have to go thru the (2,1) point :(

hasty grail Sep 22, 2020, 11:56 AM

#

the line has an equation y = 1/2 * x

#

Therefore the point (2, 1) resides on that line by definition

#

Any point that fits that equation exactly lies on that line

tidal sonnet Sep 22, 2020, 12:13 PM

#

I see... Thank you

merry fern Sep 22, 2020, 12:57 PM

#

@velvet thorn, is this a good approach to creating that 3rd dataframe? its unclear to me how to create the Type column in the original dataframe using the conditions df you suggested and then this logic i put together

df_admin = pd.read_excel(
    filenames['admin'],
    sheets['admin'],
    header=0,
    usecols=[3, 5, 6, 7],
    names=['ISIN', 'Name', 'Price', 'Quantity']
)
columns = ['starts_with_rp', 'starts_with_rv', 'starts_with_buy', 'starts_with_sell', 'zero_price']
conditions_df = pd.concat([
    df_admin['Name'].str.startswith('RP', na=False),
    df_admin['Name'].str.startswith('RV', na=False),
    df_admin['Name'].str.startswith('BUY', na=False),
    df_admin['Name'].str.startswith('SELL', na=False),
    df_admin['Price'] != 0,
], axis=1)
conditions_df.columns = columns
type_mapping_data = {'Type': ['Repurchase Agreement', 'Repurchase Agreement', 'CDS', 'CDS', 'Bond'],
                     'starts_with_rp': [1, 0, 0, 0, 1],
                     'starts_with_rv': [0, 1, 0, 0, 1],
                     'starts_with_buy': [0, 0, 1, 0],
                     'starts_with_sell': [0, 0, 0, 1],
                     'zero_price': [0, 0, 0, 0, 0]
}
# Create the Type column in using df_admin, conditions_df.

#

I'm reading up on pandas.merge now

#

rather than simply joining 2 dfs, i suppose i am trying to join and create a column based on rules?

grave frost Sep 22, 2020, 1:09 PM

#

can Anyone show me real quick how to iterate and replace all elements with some other ones in a column of a Pandas DataFrame that is memory efficient (I have 2.5 Million rows)?

I basically want to iterate over all elements in first row, and then pass each element through a function, the output (it would return the exact value to be put in place, not a variable) of the function to be put in place of the element...

#

I tried something like:-
for item in df[0]: item = numerise(item)

#

But it doesn't work

jaunty scroll Sep 22, 2020, 1:15 PM

#

What is the function? I think using a lambda might work...just a suggestion and possibly not valid

grave frost Sep 22, 2020, 1:22 PM

#

@jaunty scroll function is named numerise

#

Tried a for-loop with replace and it's been going on till now...

jaunty scroll Sep 22, 2020, 1:23 PM

#

Yea I'm really not sure I just thought I'd offer that as a possibility

#

But a lambda should do basically the same thing here from what I can tell

grave frost Sep 22, 2020, 1:25 PM

#

Can you provide some pseudocode on how I should accomplish that?

#

This is what I was using. the count is the control variable since I was thinking that it was stuck in an infinite loop:-
count = int(0) for item in DataFrame[int(Row)]: count += 1 DataFrame[Row].replace(item, numerise(item), inplace=True) if count == 2500000: break

tidal bough Sep 22, 2020, 1:27 PM

#

I believe it's also possible to just apply a function to a column in-place.
EDIT: hmm, maybe not

grave frost Sep 22, 2020, 1:29 PM

#

Well, I though that this kind of stuff would be easy in Pandas. I can iterate over the required values, but not replace them

jaunty scroll Sep 22, 2020, 1:33 PM

#

trying to figure out how to represent this in a lambda, not sure if this is gonna work

grave frost Sep 22, 2020, 1:34 PM

#

I restarted the whole thing, but still no luck. maybe it's not doing enything. Lemme debug it

jaunty scroll Sep 22, 2020, 1:34 PM

#

but it would be something like output = lambda x, numerise: numerise(x)

#

I just don't know if it would support a single statement after the colon like that

grave frost Sep 22, 2020, 1:36 PM

#

ohk, I tried with 25 values - Function works but it runs it 4 TIMES! so for 25 values it made it go 100 times. This sucks

#

for 2.5 Million rows, I will die

glacial rune Sep 22, 2020, 1:37 PM

#

Is there a way to write to a csv file row by row without including \r\n? I'm importing that CSV file into a SQL database and it's including the \r\n I think

jaunty scroll Sep 22, 2020, 1:38 PM

#

are you using the csv module?

glacial rune Sep 22, 2020, 1:39 PM

#

yes

jaunty scroll Sep 22, 2020, 1:39 PM

#

I am looking back at a script I wrote recently and I just used the w.writerow() command and didn't have to use \r\n

glacial rune Sep 22, 2020, 1:40 PM

#

with open(csv_file, 'w', newline='') as out:
  csv_writer = csv.writer(out)
  csv_writer.writerows(data)

#

something like that

#

but when I open the raw .csv file in notepad++ there's a CR and LF at the end of each line

jaunty scroll Sep 22, 2020, 1:40 PM

#

maybe just run rstrip? kind of a janky fix I guess but it might work

mild topaz Sep 22, 2020, 1:50 PM

#

how i can do template matching on document images ? in python

grave frost Sep 22, 2020, 1:52 PM

#

Plz can someone provide help on my problem?

merry fern Sep 22, 2020, 2:03 PM

#

@grave frost you can try to get a help room

grave frost Sep 22, 2020, 2:03 PM

#

@merry fern What's that?

merry fern Sep 22, 2020, 2:14 PM

#

@merry fern What's that?
@grave frost on the left bar there is a section that says "python help: available" - you go into an available room and immediately post your question / code and someone will try to help

grave frost Sep 22, 2020, 2:15 PM

#

So it's like a private room for only my problem? Nice feature!

merry fern Sep 22, 2020, 2:30 PM

#

Correct

velvet thorn Sep 22, 2020, 3:13 PM

#

@grave frost what is the function?

grave frost Sep 22, 2020, 3:14 PM

#

@velvet thorn it is a function to convert the string to a numerical format (one-hot encoding)

velvet thorn Sep 22, 2020, 3:14 PM

#

@merry fern you want to combine the original dataframe and conditions_df, then join the result on that onto your third dataframe

#

@velvet thorn it is a function to convert the string to a numerical format (one-hot encoding)
@grave frost show code.

#

and explain exactly what you want to do

merry fern Sep 22, 2020, 3:15 PM

#

@merry fern you want to combine the original dataframe and conditions_df, then join the result on that onto your third dataframe
@velvet thorn not sure how to do that, but im making progress, bc i couldnt figure out the logic in joining dfs, i started playing with another method which works! again, im just not sure the best most efficient way to go about solving the problem :

conditions_admin = [
    df_admin['ISIN'].isnull,
    (((df_admin['Name'].str.startswith('RP', na=False)) | (df_admin['Name'].str.startswith('RV', na=False))) & (df_admin['Price'] == 0)),
    ((df_admin['Name'].str.startswith('BUY', na=False)) | (df_admin['Name'].str.startswith('SELL', na=False))),
    (((~df_admin['Name'].str.startswith('RP', na=False)) | (~df_admin['Name'].str.startswith('RV', na=False))) & (df_admin['Price'] != 0))
]
values_admin = ['Other', 'Repurchase Agreement', 'CDS', 'Bond']
df_admin['Type'] = np.select(conditions_admin, values_admin)

velvet thorn Sep 22, 2020, 3:16 PM

#

sure, if that works for you, go ahead

grave frost Sep 22, 2020, 3:16 PM

#

@velvet thorn Well I am able to do the one-hot encoding, then converting the values to float (since int gives some sort of error) but now the problem is coming in the training phase

merry fern Sep 22, 2020, 3:16 PM

#

actually, the most recent addition, "isnull" is breaking it

velvet thorn Sep 22, 2020, 3:16 PM

#

I'm not even sure if a join would be the most efficient, but I would say it is the most idiomatic

#

@velvet thorn Well I am able to do the one-hot encoding, then converting the values to float (since int gives some sort of error) but now the problem is coming in the training phase
@grave frost what's the problem

grave frost Sep 22, 2020, 3:17 PM

#

`InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: indices[0,0] = 2147483647 is not in [0, 18)
[[node sequential/embedding/embedding_lookup (defined at <ipython-input-10-bf624c056173>:13) ]]
(1) Invalid argument: indices[0,0] = 2147483647 is not in [0, 18)
[[node sequential/embedding/embedding_lookup (defined at <ipython-input-10-bf624c056173>:13) ]]
[[Adam/Adam/update/AssignSubVariableOp/_25]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_3133]

Errors may have originated from an input operation.
Input Source operations connected to node sequential/embedding/embedding_lookup:
sequential/embedding/embedding_lookup/2299 (defined at /usr/lib/python3.6/contextlib.py:81)

Input Source operations connected to node sequential/embedding/embedding_lookup:
sequential/embedding/embedding_lookup/2299 (defined at /usr/lib/python3.6/contextlib.py:81)

Function call stack:
train_function -> train_function
`

hasty grail Sep 22, 2020, 3:17 PM

#

looks like integer overflow to me

velvet thorn Sep 22, 2020, 3:17 PM

#

...I feel like something is wrong with your data processing

grave frost Sep 22, 2020, 3:18 PM

#

Hm.. so TF doesn't accept float values?

velvet thorn Sep 22, 2020, 3:18 PM

#

it does

hasty grail Sep 22, 2020, 3:18 PM

#

can you post your code?

velvet thorn Sep 22, 2020, 3:18 PM

#

but I have no idea what your pipeline is like

#

or what your code looks like

#

which is why I said show code

grave frost Sep 22, 2020, 3:19 PM

#

Like this 2.10640103e+37

#

ofc, wait a min

hasty grail Sep 22, 2020, 3:19 PM

#

this is why I always add assertions to my pipelines so things like this don't happen

#

MIN_VAL, MAX_VAL = 0., 1.
tf.debugging.assert_greater_equal(dataset, MIN_VAL)
tf.debugging.assert_less_equal(dataset, MAX_VAL)

grave frost Sep 22, 2020, 3:20 PM

#

`for chunk in pd.read_csv("/content/hash.txt", header=None,chunksize=25000000):
df = pd.concat([df, chunk], ignore_index=True)
df.head

from sklearn.model_selection import train_test_split
train, val = train_test_split(df, test_size=0.1)
print(len(train), 'train examples')
print(len(val), 'validation examples') #(train and val are both DF's')

Batch size

BATCH_SIZE = 1

BUFFER_SIZE = 10000

Length of the vocabulary in chars

vocab = ['\n', ',', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f'] #Vocab List goes here
vocab_size = len(vocab)+1

The embedding dimension

embedding_dim = 12000

dataset = tf.data.Dataset.from_tensor_slices((train[0].values.astype(float), train[1].values)).shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)

for feat, targ in dataset.take(5):
print ('Features: {}, Target: {}'.format(feat, targ))

val_dataset = tf.data.Dataset.from_tensor_slices((val[0].values.astype(float), val[1].values)).shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)

for feat, targ in val_dataset.take(5): #printing validation data
print ('Features_VAL: {}, Target_VAL: {}'.format(feat, targ))
`

hasty grail Sep 22, 2020, 3:22 PM

#

use 3 backticks

velvet thorn Sep 22, 2020, 3:22 PM

#

less @hasty grail

hasty grail Sep 22, 2020, 3:22 PM

#

for the code block

grave frost Sep 22, 2020, 3:22 PM

#

@hasty grailThanx for the tip

#

0    object
1     int64
dtype: object

This is the train DataFrame

#

    df = pd.concat([df, chunk], ignore_index=True)
df.head

from sklearn.model_selection import train_test_split
train, val = train_test_split(df, test_size=0.1)
print(len(train), 'train examples')
print(len(val), 'validation examples')  #(train and val are both DF's')

# Batch size
BATCH_SIZE = 1

BUFFER_SIZE = 10000

# Length of the vocabulary in chars
vocab = ['\n', ',', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f']     #Vocab List goes here
vocab_size = len(vocab)+1

# The embedding dimension
embedding_dim = 12000

dataset = tf.data.Dataset.from_tensor_slices((train[0].values.astype(float), train[1].values)).shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)

for feat, targ in dataset.take(5):
  print ('Features: {}, Target: {}'.format(feat, targ))

val_dataset = tf.data.Dataset.from_tensor_slices((val[0].values.astype(float), val[1].values)).shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)

for feat, targ in val_dataset.take(5):                 #printing validation data
  print ('Features_VAL: {}, Target_VAL: {}'.format(feat, targ))

#

@velvet thorn @hasty grail

velvet thorn Sep 22, 2020, 3:24 PM

#

actually, the most recent addition, "isnull" is breaking it
@merry fern oaky let me explain the join approach

merry fern Sep 22, 2020, 3:24 PM

#

i fixed it, but i woudl like to understand what you were talking about last night

hasty grail Sep 22, 2020, 3:26 PM

#

for chunk in pd.read_csv("/content/hash.txt", header=None,chunksize=25000000):
    df = pd.concat([df, chunk], ignore_index=True)

Does this even work? You would be trying to concat None and a DataFrame together in the first iteration

grave frost Sep 22, 2020, 3:27 PM

#

Well, df.head gives a good output

#

shld I reduce the chunkSize?

hasty grail Sep 22, 2020, 3:28 PM

#

what does df look like?

#

There's no point using a chunksize since you're loading the entire thing into memory anyway

velvet thorn Sep 22, 2020, 3:29 PM

#

@merry fern

#

>>> df
    cond_1  cond_2  cond_3
0    False    True    True
1    False    True    True
2    False   False   False
3    False    True    True
4     True    True   False
5     True   False   False
6     True    True   False
7     True    True   False
8    False    True   False
9     True    True    True
10    True   False    True
11   False   False    True
12    True    True    True
13    True    True   False
14   False   False    True
15    True   False    True
>>> indicator_df
   cond_1  cond_2  cond_3 category
0    True    True    True    CAT_A
1   False   False    True    CAT_B
2    True   False   False    CAT_C
>>> conditions = ['cond_1', 'cond_2', 'cond_3']
>>> pd.merge(df, indicator_df, how='left', left_on=conditions, right_on=conditions)
    cond_1  cond_2  cond_3 category
0    False    True    True      NaN
1    False    True    True      NaN
2    False   False   False      NaN
3    False    True    True      NaN
4     True    True   False      NaN
5     True   False   False    CAT_C
6     True    True   False      NaN
7     True    True   False      NaN
8    False    True   False      NaN
9     True    True    True    CAT_A
10    True   False    True      NaN
11   False   False    True    CAT_B
12    True    True    True    CAT_A
13    True    True   False      NaN
14   False   False    True    CAT_B
15    True   False    True      NaN

#

(sorry, I know there are two conversations going on and this is a huge chunk)

grave frost Sep 22, 2020, 3:29 PM

#

@hasty grail When loading the entire thing in memory it doesn't load due to the lack of RAM

#

[2499999 rows x 2 columns]>
AND:-
0    object
1     int64
dtype: object

hasty grail Sep 22, 2020, 3:31 PM

#

What object are they exactly?

grave frost Sep 22, 2020, 3:32 PM

#

[0] is supposed to look like a long list of numbers : ``3790673563025180902423922202540363554017`

#

Dunno why object. I did try to convert but got error

#

So in the pipeline step I use .astype(float)

velvet thorn Sep 22, 2020, 3:33 PM

#

that's too big for int64

grave frost Sep 22, 2020, 3:34 PM

#

@velvet thorn well, it's a long string encoded in numbers

velvet thorn Sep 22, 2020, 3:34 PM

#

might help if you explained your problem in detail

#

like why you're doing what you're doing

merry fern Sep 22, 2020, 3:34 PM

#

@merry fern
@velvet thorn have to stop you for a second and thank you for chatting with me about this, between last night and today made some major leaps in processing the data. you're awesome! thank you

velvet thorn Sep 22, 2020, 3:34 PM

#

yw 🙂

hasty grail Sep 22, 2020, 3:34 PM

#

Can you set the dtype parameter to {'first_col': np.str, 'second_col': np.int32}?

grave frost Sep 22, 2020, 3:35 PM

#

@velvet thorn I am making a seq2seq model that tries to find out the relationship b/w a string(encoded) and a number.

hasty grail Sep 22, 2020, 3:35 PM

#

the long list should then be parsed as a string

merry fern Sep 22, 2020, 3:35 PM

#

I just found that it helps to create a new python file with only the code I am working on to isolate that and then bring it back into the full file once it works accordingly

velvet thorn Sep 22, 2020, 3:35 PM

#

anyway, the dataframes above should illustrate how to combine the conditions with the "indicator dataframe" (which is a mapping from conditions to intended category)

hasty grail Sep 22, 2020, 3:35 PM

#

while the second column would be parsed as int32 so you don't have to cast it to int32 when using TF

velvet thorn Sep 22, 2020, 3:35 PM

#

imagine also that df has additional columns which are your other features that the conditions are derived from (they won't be affected by the join)

grave frost Sep 22, 2020, 3:35 PM

#

Wait how do I define the dtypes?

merry fern Sep 22, 2020, 3:35 PM

#

im reading

hasty grail Sep 22, 2020, 3:35 PM

#

it's a parameter in read_csv

velvet thorn Sep 22, 2020, 3:35 PM

#

what's your pandas version @grave frost

hasty grail Sep 22, 2020, 3:36 PM

#

https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html

merry fern Sep 22, 2020, 3:36 PM

#

okay, i see what youre saying, but how do you then join the newly added column to the original data instead of to the conditions df?

grave frost Sep 22, 2020, 3:36 PM

#

1.0.5

velvet thorn Sep 22, 2020, 3:36 PM

#

okay, i see what youre saying, but how do you then join the newly added column to the original data instead of to the conditions df?
@merry fern "newly added"?

#

oh.

#

okay so basically df here is the concatenation (pd.concat) of the original DataFrame with your features and conditions_df

#

then you join that on indicator_df

merry fern Sep 22, 2020, 3:37 PM

#

df doesn't contain bools, it contains strings and floats, so i dont follow how to merge them

velvet thorn Sep 22, 2020, 3:38 PM

#

@hasty grail When loading the entire thing in memory it doesn't load due to the lack of RAM
@grave frost why don't you train with an iterator

#

that loads on demand?

#

df doesn't contain bools, it contains strings and floats, so i dont follow how to merge them
@merry fern you're only joining on the condition columns

#

which we produced earlier, remember?

#

those are bools

#

all other columns do not participate in the join, but merely appear in the result unchanged

hasty grail Sep 22, 2020, 3:39 PM

#

@grave frost Actually, since you're using TF anyway

#

why not use this?

grave frost Sep 22, 2020, 3:39 PM

#

@velvet thorn I don't know much about them. But whenever I load csv it will run out of memory

hasty grail Sep 22, 2020, 3:39 PM

#

https://www.tensorflow.org/api_docs/python/tf/data/experimental/CsvDataset?hl=en

velvet thorn Sep 22, 2020, 3:39 PM

#

@velvet thorn I don't know much about them. But whenever I load csv it will run out of memory
there's a way to load part of the dataset, train on that batch, then load another part, repeat

grave frost Sep 22, 2020, 3:40 PM

#

@hasty grail I tried that, but again memory problem

hasty grail Sep 22, 2020, 3:40 PM

#

it also supports shuffling so you can do that without using sklearn

#

did you set the buffer size?

grave frost Sep 22, 2020, 3:41 PM

#

How much shld I set it to?

hasty grail Sep 22, 2020, 3:41 PM

#

depends on how much memory you can afford

grave frost Sep 22, 2020, 3:41 PM

#

12Gb

hasty grail Sep 22, 2020, 3:42 PM

#

I think you can use the default buffer size when initing the dataset

#

but then use a larger buffer size for the shuffle method, which is in terms of number of elements

grave frost Sep 22, 2020, 3:43 PM

#

Trying with 10

#

@hasty grail @velvet thorn Is this valid? looks empty to me <CsvDatasetV2 shapes: ((), ()), types: (tf.int64, tf.int64)>

hasty grail Sep 22, 2020, 3:45 PM

#

you need to set a valid type for the columns

#

the x values are likely to overflow given how long they are

#

what do they even represent?

grave frost Sep 22, 2020, 3:46 PM

#

so tf.float()?

#

<PaddedBatchDataset shapes: ((1,), (1,)), types: (tf.float32, tf.int64)>

hasty grail Sep 22, 2020, 3:47 PM

#

[0] is supposed to look like a long list of numbers : 3790673563025180902423922202540363554017
Can you explain what this means?

grave frost Sep 22, 2020, 3:47 PM

#

It's a string represented by numbers

hasty grail Sep 22, 2020, 3:47 PM

#

so it's a string

grave frost Sep 22, 2020, 3:47 PM

#

yep

#

converted

hasty grail Sep 22, 2020, 3:48 PM

#

then you should specify the dtype as tf.string

#

right?

grave frost Sep 22, 2020, 3:48 PM

#

Yeah, but I converted to int so that model can understand that

#

models work only on integers, right?

hasty grail Sep 22, 2020, 3:48 PM

#

what are you trying to do?

grave frost Sep 22, 2020, 3:48 PM

#

I am making a seq2seq model that tries to find out the relationship b/w a string(encoded) and a number.

hasty grail Sep 22, 2020, 3:49 PM

#

oh

grave frost Sep 22, 2020, 3:49 PM

#

and with that relationship predict some more numbers for the given encoded string

hasty grail Sep 22, 2020, 3:49 PM

#

then you should transform that sequence of numbers into a sequence of one-hot encoded vectors

grave frost Sep 22, 2020, 3:49 PM

#

Unfortunately, it's alphanumeric

hasty grail Sep 22, 2020, 3:49 PM

#

it still holds

#

you can still one-hot encode it

grave frost Sep 22, 2020, 3:50 PM

#

but what is the problem in numbers?

hasty grail Sep 22, 2020, 3:50 PM

#

it's too big

grave frost Sep 22, 2020, 3:50 PM

#

but a float is not

#

does TF automatically convert the tf.string to nums?

arctic wedgeBOT Sep 22, 2020, 3:51 PM

#

You are not allowed to use that command here. Please use the #bot-commands channel instead.

desert oar Sep 22, 2020, 3:52 PM

#

@hasty grail the bot doesnt have numpy installed

grave frost Sep 22, 2020, 3:52 PM

#

NameError: name 'iiinfo' is not defined

hasty grail Sep 22, 2020, 3:52 PM

#

It does, actually

pale thunder Sep 22, 2020, 3:52 PM

#

I am pretty sure it does

desert oar Sep 22, 2020, 3:53 PM

#

no kidding

#

!e ```python
import numpy as np
print( np.array([1,2,3]) )

arctic wedgeBOT Sep 22, 2020, 3:53 PM

#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

[1 2 3]

desert oar Sep 22, 2020, 3:53 PM

#

looks like im mistaken, thats very helpful to know

#

!e ```python
import pandas as pd
print(pd.Series([1,2,3]))

arctic wedgeBOT Sep 22, 2020, 3:53 PM

#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

001 | 0    1
002 | 1    2
003 | 2    3
004 | dtype: int64

desert oar Sep 22, 2020, 3:53 PM

#

😮

pale thunder Sep 22, 2020, 3:53 PM

#

also networkx and forbiddenfruit

hasty grail Sep 22, 2020, 3:54 PM

#

import numpy as np
print(np.iinfo(np.int64).max)
print(np.finfo(np.float64).max)

desert oar Sep 22, 2020, 3:54 PM

#

!e ```python
import numpy as np
print(np.iinfo(np.int64).max)
print(np.finfo(np.float64).max)

arctic wedgeBOT Sep 22, 2020, 3:54 PM

#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

001 | 9223372036854775807
002 | 1.7976931348623157e+308

grave frost Sep 22, 2020, 3:54 PM

#

seems big enough for my purpose

desert oar Sep 22, 2020, 3:54 PM

#

note that the numpy max isnt necessarily the same as the python max

hasty grail Sep 22, 2020, 3:54 PM

#

what error function are you planning to use for this?

grave frost Sep 22, 2020, 3:55 PM

#

Ok no it isn't

desert oar Sep 22, 2020, 3:55 PM

#

why do you need numbers that are so large? what are you doing exactly?

pale thunder Sep 22, 2020, 3:55 PM

#

you can also get unbounded integers in numpy with the object dtype and using python ints at a pretty severe performance hit

desert oar Sep 22, 2020, 3:55 PM

#

i see some stuff about tensorflow above

grave frost Sep 22, 2020, 3:55 PM

#

sparse_categorical_crossentropy

desert oar Sep 22, 2020, 3:55 PM

#

that's returning numbers that are on the order of 1e300?

hasty grail Sep 22, 2020, 3:55 PM

#

I am making a seq2seq model that tries to find out the relationship b/w a string(encoded) and a number.

#

Can you provide an example?

grave frost Sep 22, 2020, 3:56 PM

#

9128165362313010342475980190211245832428 --> 2499996

desert oar Sep 22, 2020, 3:56 PM

#

yeah i would need to see an example too. but if it's seq2seq i'd imagine you would want to encode the number as a string of digits rather than a number

#

are you the one who was trying to use ML to reverse hashes a while ago?

#

is that what these are?

grave frost Sep 22, 2020, 3:57 PM

#

@desert oar do models work with tf.string?

#

yep

merry fern Sep 22, 2020, 3:57 PM

#

so lets say i have 2 dataframes i want to combine, but in the new combined dataframe, i want each dataframe to have an index from its source...

desert oar Sep 22, 2020, 3:57 PM

#

i dont know what that question is supposed to mean

hasty grail Sep 22, 2020, 3:57 PM

#

The fact that you're using sparse_categorical_crossentropy implies that there's a (relatively) small set of output values

desert oar Sep 22, 2020, 3:57 PM

#

@merry fern be careful, what does "combine" mean here?

hasty grail Sep 22, 2020, 3:57 PM

#

But your output is 2499996...

grave frost Sep 22, 2020, 3:57 PM

#

There is a small set

merry fern Sep 22, 2020, 3:57 PM

#

@merry fern be careful, what does "combine" mean here?
@desert oar not concat, but to add the rows together

desert oar Sep 22, 2020, 3:57 PM

#

@grave frost cross entropy is for categorical targets. you are not going to have 1 target for literally every ~~real number~~ integer

grave frost Sep 22, 2020, 3:57 PM

#

It's just a combination of different classes including intergers

merry fern Sep 22, 2020, 3:57 PM

#

so i think merge

hasty grail Sep 22, 2020, 3:57 PM

#

Can you describe the set of outputs?

desert oar Sep 22, 2020, 3:58 PM

#

@merry fern can you give an example of inputs and outputs for this? fake data is fine, maybe just 5 rows

grave frost Sep 22, 2020, 3:58 PM

#

It will predict many classes in succesion right? (like 1 then 3 etc..)

desert oar Sep 22, 2020, 3:58 PM

#

@grave frost can you give an example of more records here?

#

where are these digit strings coming from

hasty grail Sep 22, 2020, 3:58 PM

#

what exactly is the relationship you're trying to model

desert oar Sep 22, 2020, 3:58 PM

#

and where are the numbers coming from

grave frost Sep 22, 2020, 3:58 PM

#

Of a hash and it's decoded value

desert oar Sep 22, 2020, 3:59 PM

#

ok. and what is the space of all valid decoded values?

grave frost Sep 22, 2020, 3:59 PM

#

but that's more confusing to explain

#

from 1-2.5 Million

desert oar Sep 22, 2020, 3:59 PM

#

no it's not more confusing to explain

merry fern Sep 22, 2020, 3:59 PM

#

@desert oar i hope this is illustrative

📎 unknown.png

hasty grail Sep 22, 2020, 3:59 PM

#

how many possible in/out values are there?

desert oar Sep 22, 2020, 4:00 PM

#

it sounds like 2.5 million outputs

hasty grail Sep 22, 2020, 4:00 PM

#

that's definitely not a small set

desert oar Sep 22, 2020, 4:00 PM

#

@merry fern so you want to stack them vertically?

grave frost Sep 22, 2020, 4:00 PM

#

So which loss to use?

merry fern Sep 22, 2020, 4:00 PM

#

@merry fern so you want to stack them vertically?
@desert oar right, to create a master list

im compiling differences between data sources, and then i want to have a master list of differences

#

without losing reference to where those differences originate

grave frost Sep 22, 2020, 4:01 PM

#

So what exactly should I do to accomplish my goal?

hasty grail Sep 22, 2020, 4:02 PM

#

assuming your decoded values are in the range of int64 you can use MAE

desert oar Sep 22, 2020, 4:02 PM

#

!e ```python
import pandas as pd

data1 = pd.DataFrame({
'XYZ': {'ISIN': 123, 'Q': 100, 'P': 1},
'ABC': {'ISIN': 345, 'Q': 200, 'P': 2},
}).rename_axis(index='Type')

data2 = pd.DataFrame({
'XYZ': {'ISIN': 123, 'Q': 100, 'P': 1},
'ABC': {'ISIN': 345, 'Q': 200, 'P': 2},
}).rename_axis(index='Type')

data = pd.concat({'A': data1, 'B': data2})
print( data )

grave frost Sep 22, 2020, 4:02 PM

#

@hasty grail mape?

arctic wedgeBOT Sep 22, 2020, 4:02 PM

#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

001 |         XYZ  ABC
002 |   Type          
003 | A ISIN  123  345
004 |   Q     100  200
005 |   P       1    2
006 | B ISIN  123  345
007 |   Q     100  200
008 |   P       1    2

hasty grail Sep 22, 2020, 4:02 PM

#

sorry, MAE

grave frost Sep 22, 2020, 4:02 PM

#

Will that fix the error?

hasty grail Sep 22, 2020, 4:03 PM

#

no

#

your inputs are fundamentally incorrect

desert oar Sep 22, 2020, 4:03 PM

#

@grave frost you might be able to get away with classification if you use negative sampling like they do in NLP models http://mccormickml.com/2017/01/11/word2vec-tutorial-part-2-negative-sampling/

hasty grail Sep 22, 2020, 4:03 PM

#

you need to input it as a sequence instead of a single number

desert oar Sep 22, 2020, 4:03 PM

#

and yes 100% the input needs to be a sequence of digits and not a number

grave frost Sep 22, 2020, 4:03 PM

#

SO a string?

desert oar Sep 22, 2020, 4:03 PM

#

yes, a string

#

that just happens to contain digits

hasty grail Sep 22, 2020, 4:03 PM

#

more like a sequence of one-hot encoded vectors

grave frost Sep 22, 2020, 4:04 PM

#

so Tf.String is an acceptable dtype?

desert oar Sep 22, 2020, 4:04 PM

#

but you still need to figure out how to convert said string to numbers

#

whether one-hot-encoded or something else

#

@merry fern did you see my example above? the important part is passing a dict do pd.concat

grave frost Sep 22, 2020, 4:04 PM

#

But not the one long number?

desert oar Sep 22, 2020, 4:04 PM

#

correct, definitely not one big number

hasty grail Sep 22, 2020, 4:05 PM

#

one-hot is the simplest way to do it imo

desert oar Sep 22, 2020, 4:05 PM

#

since you only have 10 individual characters in a string, one hot encoding is parsimonious and sensible

grave frost Sep 22, 2020, 4:05 PM

#

alright, I will implement it. Anything else I need to keep in mind?

desert oar Sep 22, 2020, 4:05 PM

#

like i said, use negative sampling

#

i linked a blog post about it above

#

it's what they use in "traditional" word2vec to improve training time

merry fern Sep 22, 2020, 4:06 PM

#

@desert oar this part of the code is what i was looking for! data = pd.concat({'A': data1, 'B': data2})

desert oar Sep 22, 2020, 4:06 PM

#

you could try your model as a regression model too @grave frost but im skeptical that it's the right thing here

#

@merry fern good. note that this will give you a multi-index on your dataframe, which increases the complexity level

grave frost Sep 22, 2020, 4:06 PM

#

No regresion

hasty grail Sep 22, 2020, 4:06 PM

#

some hashes are done such that the distribution is essentially random

grave frost Sep 22, 2020, 4:07 PM

#

It is random. That's the whole point

desert oar Sep 22, 2020, 4:07 PM

#

yeah i also think this project will go nowhere btw

#

might be a good exercise in learning tensorflow but

merry fern Sep 22, 2020, 4:07 PM

#

yes thats what i meant, multi-index. the funny thing is when i output to csv, it puts the multi-index in every line

but in the python console, it just shows a header line and then goes forward, example:

Master Break List 
                                Quantity     Price
           Type ISIN                             
Int vs. PB Bond RU000A0JWHA4  7000000.0 -0.258750
                RU000A0ZYYN4 -3000000.0 -1.123458

vs.

📎 unknown.png

desert oar Sep 22, 2020, 4:07 PM

#

if you could reverse hashes with machine learning we'd have really big problems on our hands

hasty grail Sep 22, 2020, 4:07 PM

#

yeah xD

merry fern Sep 22, 2020, 4:07 PM

#

is that by design?

desert oar Sep 22, 2020, 4:07 PM

#

@merry fern yes, that's intentional

merry fern Sep 22, 2020, 4:07 PM

#

great

grave frost Sep 22, 2020, 4:07 PM

#

I have to find a very Small relationship. Something that even bring the loss to 0.002 is outstanding

merry fern Sep 22, 2020, 4:07 PM

#

awesome awesome awesome

grave frost Sep 22, 2020, 4:08 PM

#

It's jsut a POC

hasty grail Sep 22, 2020, 4:08 PM

#

0.2 would be outstanding already

grave frost Sep 22, 2020, 4:08 PM

#

For another architecture I had in mind

desert oar Sep 22, 2020, 4:08 PM

#

indeed, you are going to basically be depending on the PRNG being insecure

merry fern Sep 22, 2020, 4:08 PM

#

i have made leaps and bounds in digesting excel data, mapping columns, creating new columns based off data, and organizing the output! thank you @desert oar @velvet thorn and everyone else

desert oar Sep 22, 2020, 4:08 PM

#

if you can get the accuracy anywhere close to 5% i'll be surprised

#

@merry fern thats great to hear. the more you learn the easier (and more fun) it gets

merry fern Sep 22, 2020, 4:08 PM

#

yes!

#

and hope to someday be able to sit in here and answer some questions too...

desert oar Sep 22, 2020, 4:09 PM

#

you probably already can

grave frost Sep 22, 2020, 4:09 PM

#

I am expecting val acc to be 0.004-ish

hasty grail Sep 22, 2020, 4:09 PM

#

depending on how many possible outputs are there

grave frost Sep 22, 2020, 4:09 PM

#

That's the most realistic

#

0.004%

desert oar Sep 22, 2020, 4:09 PM

#

also here's a hot tip @merry fern :

#

!e ```python
import pandas as pd

data1 = pd.DataFrame({
'XYZ': {'ISIN': 123, 'Q': 100, 'P': 1},
'ABC': {'ISIN': 345, 'Q': 200, 'P': 2},
}).rename_axis(index='Type')

data2 = pd.DataFrame({
'XYZ': {'ISIN': 123, 'Q': 100, 'P': 1},
'ABC': {'ISIN': 345, 'Q': 200, 'P': 2},
}).rename_axis(index='Type')

data = pd.concat({'A': data1, 'B': data2}, names=['source'])
print( data )

arctic wedgeBOT Sep 22, 2020, 4:09 PM

#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

001 |              XYZ  ABC
002 | source Type          
003 | A      ISIN  123  345
004 |        Q     100  200
005 |        P       1    2
006 | B      ISIN  123  345
007 |        Q     100  200
008 |        P       1    2

desert oar Sep 22, 2020, 4:10 PM

#

look at the names of the multiindex when you use the names= parameter

#

nice little convenience feature

grave frost Sep 22, 2020, 4:10 PM

#

Negative sampling drops out some inputs

#

*not consider them

desert oar Sep 22, 2020, 4:11 PM

#

@merry fern otherwise you'd have to write

data = pd.concat({'A': data1, 'B': data2}).columns.rename('source', level=0)

merry fern Sep 22, 2020, 4:11 PM

#

nice i just added that

desert oar Sep 22, 2020, 4:11 PM

#

@grave frost yes, otherwise it would be computationally ridiculous to compute the weight update for all 2.5m outputs

grave frost Sep 22, 2020, 4:12 PM

#

Alright, as long as it implement in a line or 2 (as if that ever happens)

desert oar Sep 22, 2020, 4:12 PM

#

thats the wrong attitude

grave frost Sep 22, 2020, 4:12 PM

#

But not gonna lose the optimism

#

🙂

desert oar Sep 22, 2020, 4:12 PM

#

you use the methods that exist, not the tools you happen to have sitting right in front of you

#

if you want to do something, either you do it or you don't

#

try without negative sampling and see what happens

merry fern Sep 22, 2020, 4:12 PM

#

leaving for now, need to workout i feel l like shit. ttyl

desert oar Sep 22, 2020, 4:12 PM

#

if it works, then great

grave frost Sep 22, 2020, 4:13 PM

#

ok, 1-hot, MAE and -ve sampling. anything else?

desert oar Sep 22, 2020, 4:13 PM

#

but it will probably be dog slow

#

MAE is incompatible with classification

#

you know this

grave frost Sep 22, 2020, 4:13 PM

#

No

desert oar Sep 22, 2020, 4:13 PM

#

do you know what MAE stands for?

#

do you know how it is defined?

#

it literally does not make sense on a classification problem

#

it is for regression problems

#

if you do this as a regression problem, then yes MAE is valid

grave frost Sep 22, 2020, 4:14 PM

#

Damn

#

and as a classification?

desert oar Sep 22, 2020, 4:14 PM

#

accuracy, precision, recall, f1, ...

severe spindle Sep 22, 2020, 4:14 PM

#

I would like to learn how to use machine learning algorithms, I have no prior experience with machine learning. Can someone recommend a good place to learn from?

desert oar Sep 22, 2020, 4:15 PM

#

@severe spindle how much programming experience do you have?

grave frost Sep 22, 2020, 4:15 PM

#

k, I will do classification

severe spindle Sep 22, 2020, 4:15 PM

#

@desert oar Several years , I'm a third year computer science student at university

desert oar Sep 22, 2020, 4:15 PM

#

using machine learning algorithms is not the right way to think about it. machine learning is a process that requires using these algorithms at some point

#

they dont make sense outside the context of actually doing some kind of prediction or other machine learning work

#

since you already know programming, the course https://fast.ai might be a good place to start

Home

Making neural nets uncool again

#

they get you right into the sexy deep learning stuff

severe spindle Sep 22, 2020, 4:17 PM

#

okay sounds really good, thanks for explaining a little too 😄

desert oar Sep 22, 2020, 4:17 PM

#

if you want to make a career out of this (or do more than trivial hobby projects) you'll eventually want to go back and get into the foundational math and start learning stats as well

grave frost Sep 22, 2020, 4:17 PM

#

Thanx a ton for your help guys and placing me in the right direction! 🙂 @desert oar @hasty grail @velvet thorn

hasty grail Sep 22, 2020, 4:17 PM

#

np and... good luck

desert oar Sep 22, 2020, 4:17 PM

#

you're welcome both, sorry i'm in a curt mood today but i just dont want people wasting their time on stuff that's not worth doing

grave frost Sep 22, 2020, 4:18 PM

#

Nothing to worry about. I know it is hopeless, but I need a baseline

hasty grail Sep 22, 2020, 4:18 PM

#

https://arxiv.org/pdf/1901.02438.pdf

#

This might be a place to start

grave frost Sep 22, 2020, 4:18 PM

#

It is actually a POC for establishing that there is bias in randomness. A base for a new architecture I am planning to develop.

hasty grail Sep 22, 2020, 4:18 PM

#

first thing that showed up in my search

severe spindle Sep 22, 2020, 4:18 PM

#

I'm taking some ML modules this year so I'm sure I'll have to hit the books for the maths at some point too but you;ve given me a good place to start, thanks

desert oar Sep 22, 2020, 4:19 PM

#

there is bias in randomness
i think you might need to qualify this a bit. there is a lot of literature already on cryptography... like 50 years of it, written by very smart people

grave frost Sep 22, 2020, 4:21 PM

#

NN's cannot predict plaintext out of encrypted text. THe best you can do is REDUCE the time taken to decrypt, which is what my theoretical architecture does. But right now its very naive and basic phase (too much assumptions - lesser testing). It would need people with PHD to make it...

#

@desert oar A simple assumption that anything random can be proved to be non-random given an enough complex function

#

That's is what basically linear models do, albeit on a very small scale.

#

There are several theoreoms and proofs on it

#

Good Night to all!

desert oar Sep 22, 2020, 4:25 PM

#

i dont think thats what the universal approximation theorem actually says...

#

but good luck

merry fern Sep 22, 2020, 4:37 PM

#

@desert oar next rabbit hole for me is rendering results in a web platform 😛

grave frost Sep 22, 2020, 5:24 PM

#

@desert oar You don't need to understand any theoreoms for the basic idea- It's pretty intuititve in itself. Consider a series of numbers :- 0,2,4,6,8. Then, f(x)=x+2, for x in W. Now consider a relatively more complex series:- 0,1,4,9. Here, f(x)=x^2. For ANY given sequence of numbers, I can compute a corresponding function to represent that data. Assume you have very less intelligence than the average human, only enough to grasp basic arithmetic. then the second set of numbers might look like random numerals to you. But actually they are governed by a function having a complexity outside you understanding. A NN tries to approximate that function. So no matter how random set of numbers you can give, there will always be a relation. Now it might be that relation to be too complex to be computable with normal machines and would require quantum level power to compute, but there always be a relation. http://neuralnetworksanddeeplearning.com/chap4.html Look at this link for a more visual idea

elder creek Sep 22, 2020, 5:33 PM

#

Hey all! Anyone able to answer a question about calling data from a linear regression model (OLS) generated by statsmodels?

#

I'm tryin to get a list of the pvalues greater than .05, using:

print('P Values: ', model.pvalues > .05)

This returns a df showing every row and the Boolean value for the pvalue of each row. I'm new to python (and programming), so I don't know a lot of things that should be obvious.

strong osprey Sep 22, 2020, 5:53 PM

#

Hey guys! How can I use current value when updating data in Pandas DataFrame?

desert oar Sep 22, 2020, 5:57 PM

#

@grave frost i am familiar with the universal approximation theorem. what you seem to be ignorant of is the large body of work dedicated to making PRNGs look and act as random as possible, specifically for the purpose of making what you are trying to do effectively impossible

#

it more or less forms the basis of modern cryptography

#

if you dont want to take my word for it, go ask about it on a math forum or computer science forum or cryptography forum. see what they have to say about your project

#

i'm willing to be proven wrong, but not by a misquoted textbook chapter

#

@strong osprey can you clarify what you are trying to do? preferably with sample input data and the desired result

#

@elder creek the result of model.pvalues > 0.5 is a Series containing Boolean values -- you can use that Series to select only the True rows like so:

model.pvalues.loc[model.pvalues > .05]

elder creek Sep 22, 2020, 5:59 PM

#

Yes

desert oar Sep 22, 2020, 6:00 PM

#

(mistakes corrected above)

elder creek Sep 22, 2020, 6:00 PM

#

Wow, awesome, thank you so much

desert oar Sep 22, 2020, 6:00 PM

#

i'd also encourage you to avoid selecting features blindly by p > 0.05

strong osprey Sep 22, 2020, 6:00 PM

#

I want to append data to cell. I know I can do :

self.data.loc[self.data['SKU'] == '826945379', 'Images'] = 'aaa'

#

but how can i use the data that already is in the cell

elder creek Sep 22, 2020, 6:01 PM

#

I've got 136 beta values... just looking to remove the ones with more than .05 pvalue

desert oar Sep 22, 2020, 6:01 PM

#

why 0.05? why not 0.01? have you adjusted the p-values for multiple comparisons? does it even make sense to compare the coefficient to 0? is the model actually homoskedastic i.e. does it satisfy the statistical assumptions required to do such t-tests?

#

that is an outdated feature selection procedure in my opinion

#

what is the purpose of your model? to make predictions? or to make inferences about underlying relationships?

elder creek Sep 22, 2020, 6:02 PM

#

Wow, interesting

#

Yeah, making predictions

desert oar Sep 22, 2020, 6:02 PM

#

in that case, use regularization, ridge or lasso

elder creek Sep 22, 2020, 6:02 PM

#

Using lasso soon

#

This is more about understanding what is going on conceptually

desert oar Sep 22, 2020, 6:02 PM

#

banish all thought of stepwise selection

elder creek Sep 22, 2020, 6:02 PM

#

I get the concept piece

#

It's for a class... I've gotten the answers I need, I just want to call it in a tidy manner

desert oar Sep 22, 2020, 6:03 PM

#

@strong osprey ```python
sel = self.data['SKU'] == '826945379'
self.data.loc[sel, 'Images'] = foo(self.data.loc[sel, 'Images'])

jaunty scroll Sep 22, 2020, 6:03 PM

#

How do I access attributes of an XML element if those attributes are subelements?

desert oar Sep 22, 2020, 6:03 PM

#

its a shame they are still teaching that shit in classes

#

anyway hopefully the code snippet helps

#

@jaunty scroll are you using a specific library to do this?

jaunty scroll Sep 22, 2020, 6:04 PM

#

@desert oar element tree

desert oar Sep 22, 2020, 6:04 PM

#

and can you give an example of some XML & the resulting values you want?

#

this isnt really a data science question but it might be relevant to people here. normally in the future i would recommend asking questions like this in a help channel, see #❓｜how-to-get-help

jaunty scroll Sep 22, 2020, 6:04 PM

#

oh ok @desert oar that's good to know I wasn

strong osprey Sep 22, 2020, 6:04 PM

#

@desert oar thanks, i see

jaunty scroll Sep 22, 2020, 6:04 PM

#

't really sure where to ask

#

but this is part of the data structure

📎 unknown.png

#

I'm asking it here because the end result of this question is going to be a parser that converts to csv and from there into RedShift using dataframe

desert oar Sep 22, 2020, 6:10 PM

#

ah, then it probably is relevant here

#

hard to know

jaunty scroll Sep 22, 2020, 6:10 PM

#

yeah one of those fringe projects imo

#

this parser has been a pain because part of the XML just has standard tags and attributes and then here it does this multi-level thing that breaks the program unless I just treat all the elements as equivalent which is useless

desert oar Sep 22, 2020, 6:11 PM

#

do you have the actual definition for ns1?

#

i think lxml has better namespace support

#

if you include that as text and not just an image, i can make an example for you @jaunty scroll

jaunty scroll Sep 22, 2020, 6:16 PM

#

one moment sorry was afk

#

                        <ns1:recordIdentifier>33</ns1:recordIdentifier>
                        <ns1:insuredMemberIdentifier>P12561002023TRS</ns1:insuredMemberIdentifier>
                        <ns1:insuredMemberBirthDate>1989-12-31</ns1:insuredMemberBirthDate>
                        <ns1:insuredMemberGenderCode>M</ns1:insuredMemberGenderCode>
                        <ns1:includedInsuredMemberProfile>
                                <ns1:recordIdentifier>34</ns1:recordIdentifier>
                                <ns1:subscriberIndicator>S</ns1:subscriberIndicator>
                                <ns1:subscriberIdentifier></ns1:subscriberIdentifier>
                                <ns1:insurancePlanIdentifier>93182VA013001402</ns1:insurancePlanIdentifier>
                                <ns1:coverageStartDate>2015-01-01</ns1:coverageStartDate>
                                <ns1:coverageEndDate>2015-12-31</ns1:coverageEndDate>
                                <ns1:enrollmentMaintenanceTypeCode>021028</ns1:enrollmentMaintenanceTypeCode>
                                <ns1:insurancePlanPremiumAmount>450.00</ns1:insurancePlanPremiumAmount>
                                <ns1:rateAreaIdentifier>003</ns1:rateAreaIdentifier>
                        </ns1:includedInsuredMemberProfile>
                </ns1:includedInsuredMember>
                <ns1:includedInsuredMember>

#

do you want like an actual text file?

desert oar Sep 22, 2020, 6:18 PM

#

uhh this isnt PII right?

jaunty scroll Sep 22, 2020, 6:18 PM

#

no

#

its example data

desert oar Sep 22, 2020, 6:18 PM

#

ok 😅

#

and you want 1 insured member per row?

jaunty scroll Sep 22, 2020, 6:19 PM

#

my parser breaks when it gets down to the bottom of these multi-level stacks like this because I think its looking for a standard attribute like a string or integer

desert oar Sep 22, 2020, 6:19 PM

#

can you show your current code

jaunty scroll Sep 22, 2020, 6:20 PM

#

for node in tree.iter(None):
    print('\n')
    for elem in node.iter():
        if not elem.tag==node.tag:
            print("{}: {}".format(elem.tag, elem.text))```

desert oar Sep 22, 2020, 6:21 PM

#

ok 1 min

#

this is maybe a stupid question but why does the insuredMemberProfile have a separate record identifier from the insuredMember itself

jaunty scroll Sep 22, 2020, 6:23 PM

#

that is actually a good question that I couldn't even begin to answer other than to say that the fine folks in the federal government choose to organize their files this way and I have no say in the matter

desert oar Sep 22, 2020, 6:23 PM

#

lol ok

#

the reason i ask is -- you want to flatten all that stuff in there into 1 record?

jaunty scroll Sep 22, 2020, 6:24 PM

#

yea, ideally I want nested dictionaries that can be read into dataframe

#

I'm very new at this and kind of learning as I go so if there's something I say that makes no sense please correct me or ask for clarification

desert oar Sep 22, 2020, 6:25 PM

#

tree = ET.parse('InboundMedicalClaimFileExample.xml')

included_insured_members = []
for node in tree.iter('ns1:includedInsuredMember'):
    member_info = {}
    included_insured_members.append(member_info)
    if node.tag == 'ns1:includedInsuredMemberProfile':
        for subnode in node:
            member_info[subnode.tag] = subnode.text
    else:
        member_info[node.tag] = node.text

what about something like this?

jaunty scroll Sep 22, 2020, 6:28 PM

#

let me play with this for a min but this looks hopefully

#

hopeful*

#

so its giving me a mismatched tag error

#

  File "test.py", line 3, in <module>
    tree = ET.parse('inboundenrollmentfile.xml')
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.8_3.8.1729.0_x64__qbz5n2kfra8p0\lib\xml\etree\ElementTree.py", line 1202, in parse
    tree.parse(source, parser)
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.8_3.8.1729.0_x64__qbz5n2kfra8p0\lib\xml\etree\ElementTree.py", line 595, in parse
    self._root = parser._parse_whole(source)
xml.etree.ElementTree.ParseError: mismatched tag: line 439, column 2

desert oar Sep 22, 2020, 6:32 PM

#

that sounds like a problem in the file

jaunty scroll Sep 22, 2020, 6:33 PM

#

ah yes it is, that's the last line in the file

#

I just sent you the middle part because that's what was giving me the error but I think if I update these tags to capture the whole file this could work

#

I could extend this code to work for multiple such tiered data structures right?

desert oar Sep 22, 2020, 6:35 PM

#

yes of course

#

notably, bool(node) will tell you if the node has children

#

so you can say ```python
if node:
for subnode in node:
...
else:
node.text ...

jaunty scroll Sep 22, 2020, 6:38 PM

#

cool thanks for the help it is much appreciated

desert oar Sep 22, 2020, 6:49 PM

#

@jaunty scroll you can flatten nodes recursively/infinitely too if you want

def flatten_node(node):
    result_container = {}
    if node:
        for subnode in node:
            result_container = {**result_container, **flatten_node(subnode)}
    else:
        result_container[node.tag] = node.text
    return result_container

although this will fail on XML where you have 2 nodes with the same tag

#

(among many other limitations)

jaunty scroll Sep 22, 2020, 6:58 PM

#

what's the purpose of those ** @desert oar

#

aren't those usually for kwargs in function parameters?

desert oar Sep 22, 2020, 7:09 PM

#

@jaunty scroll it serves the same function here, except to the {} dict constructor

#

!e ```python
x = {'a': 1, 'b': 2}
y = {'b': 102, 'c': 103}
print( {**x, **y} )

arctic wedgeBOT Sep 22, 2020, 7:09 PM

#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

{'a': 1, 'b': 102, 'c': 103}

jaunty scroll Sep 22, 2020, 7:11 PM

#

so is constructor that a kind of anonymous function? not sure if that's the right term but it is essentially doing what a function would do here it seems like

#

that constructor*

desert oar Sep 22, 2020, 7:13 PM

#

its not a function

#

but it uses similar syntax

#

you can think of it as the people who designed the python language being clever and letting you use the same syntax in multiple places to mean similar things

feral trout Sep 22, 2020, 7:23 PM

#

how do I get the plain text from pytesseract.image_to_data?

wise garden Sep 22, 2020, 8:17 PM

#

I've got two dfs (both 24 rows) in pandas but have different indices and I can't figure out how to multiply them together. The result is series that has 25 rows so I know something is wrong

#

The different indices is due to the fact I sliced them from different parts of a data set

rustic apex Sep 22, 2020, 8:20 PM

#

How often do you use Numpy by itself? Vs with Pandas or MatLab?

desert oar Sep 22, 2020, 8:29 PM

#

@wise garden do you not care about the indices? then just do x.reset_index(drop=True) * y.reset_index(drop=True)

#

if you do care about the indexes youll have to do some more work

wise garden Sep 22, 2020, 8:30 PM

#

no this is perfect thx

keen root Sep 22, 2020, 8:41 PM

#

Hi, not sure if this is the right channel, but are all GPU calculations made with tensorflow based on CUDA? That is, if I have a GPU without CUDA support I'm "doomed"?

#

I tried to run a CNN the other day and poor CPU... I could almost hear the screams

desert oar Sep 22, 2020, 8:58 PM

#

@keen root pretty much, yes

#

CUDA is what allows programs to "talk to" the GPU without writing low level graphics code

#

tensorflow certainly depends on it

#

i specifically bought an nvidia 1060 for this reason 🙂 even though its not good for actual machine learning performance, at least it runs CUDA so its good enough to test code before paying for cloud compute

keen root Sep 22, 2020, 9:00 PM

#

that's too bad... This old baby goes back to early high school. If only I knew I would be needing cuda someday 😅

#

thank you anyway

serene scaffold Sep 22, 2020, 9:02 PM

#

@desert oar what did that cost?

desert oar Sep 22, 2020, 9:02 PM

#

@serene scaffold i got the whole rig for $300 without SSD

#

i was a 4 year old gaming rig

serene scaffold Sep 22, 2020, 9:02 PM

#

I'm hopefully going to build a pc when I graduate

desert oar Sep 22, 2020, 9:03 PM

#

some local kid was upgrading

serene scaffold Sep 22, 2020, 9:03 PM

#

Of course they were

#

I'm totally not jealous of gamer kids with better machines than me.

desert oar Sep 22, 2020, 9:03 PM

#

it was a steal tbh, i dont play recent games much, or if i do i dont care if they are max settings

serene scaffold Sep 22, 2020, 9:05 PM

#

Also I finished that assignment. The prof said it was the hardest thing for the whole course. But it seemed like it was just pandas and numpy fundamentals.

#

So idk what the rest of the class will even be.

desert oar Sep 22, 2020, 9:05 PM

#

which assignment was it again?

#

hopefully that doesnt mean you did it wrong 😉

serene scaffold Sep 22, 2020, 9:06 PM

#

Mean imputation and hot deck imputation

desert oar Sep 22, 2020, 9:06 PM

#

that, or, they were expecting people to do it in java

serene scaffold Sep 22, 2020, 9:06 PM

#

I got the same answers as two of my friends.

#

I mean I guess you could have done it in Java if you wanted to be eight levels deep in for loops.

desert oar Sep 22, 2020, 9:09 PM

#

yeah idk

#

maybe its an easy course i guess

#

seems like a waste of time if thats the hardest thing in the whole semester

#

no offense to your instructor but, theres no point in paying for school if you arent being pushed past your limits in a controlled and constructive way (imo)

#

otherwise you'd just go read a book at home

#

(there are other benefits to school too, namely the opportunity to meet and talk with other people working on similar problems as you with similar interests who might later form a professional network and also form a support network while in school)

serene scaffold Sep 22, 2020, 9:14 PM

#

@desert oar I think of formal education as a really expensive way to get your knowledge in a certain area accredited, and that all learning is basically self-learning.

desert oar Sep 22, 2020, 9:15 PM

#

and this is why people feel like education is a ripoff

#

because if thats all you got out of school, then it was a ripoff

serene scaffold Sep 22, 2020, 9:15 PM

#

yes

desert oar Sep 22, 2020, 9:15 PM

#

what i am saying is, it doesnt have to be like that, and shouldnt be like that

serene scaffold Sep 22, 2020, 9:16 PM

#

the next assignment uses data miner

#

or some platform like that. can't remember the name for sure

merry fern Sep 22, 2020, 9:26 PM

#

classes and dataframes - would it be smart to create a subclass of dataframes if I wanted to dictate their behavior?

for example, if a dataframe was empty, and I went to print it, I would want it to print "No data."

spring elk Sep 22, 2020, 9:38 PM

#

Has anyone ever rented server time/processing time, if so what did you need it for and what requirements did you have ?

desert oar Sep 22, 2020, 10:17 PM

#

@merry fern yes you could subclass dataframe and implement a new __str__ method in the subclass

merry fern Sep 22, 2020, 10:31 PM

#

@merry fern yes you could subclass dataframe and implement a new __str__ method in the subclass
@desert oar
Would it be simple in the sense that I would just setup init and str but it wouldn't screw up any other functionality built in?

#

Thanks

desert oar Sep 22, 2020, 11:20 PM

#

you dont even need init @merry fern

#

!e ```python
import pandas as pd

class MyDataFrame(pd.DataFrame):
def str(self):
if self.shape[0] == 0:
return 'No data.'
else:
return super().str()

df = MyDataFrame(columns=list('xyz'))
print(df)

arctic wedgeBOT Sep 22, 2020, 11:21 PM

#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

No data.

merry fern Sep 22, 2020, 11:21 PM

#

Coolz

#

Thx

desert oar Sep 22, 2020, 11:22 PM

#

that said... i dont really recommend this

#

it's not easy to "convert" a regular dataframe to this custom dataframe

#

you have 2 other options: 1) directly override DataFrame.__str__, 2) just write a custom pretty-print function for data frames

#

the overriding __str__ method is worse

#

so i'd recommend just writing a function

velvet thorn Sep 22, 2020, 11:24 PM

#

classes and dataframes - would it be smart to create a subclass of dataframes if I wanted to dictate their behavior?

for example, if a dataframe was empty, and I went to print it, I would want it to print "No data."
@merry fern but why?

safe tapir Sep 22, 2020, 11:51 PM

#

Anyone know the approximate performance delta between Numba and Numpy?

When preprocessing data, is it better to try to use the built-in pandas functions, or just write your own @numba.jit functions?

desert oar Sep 23, 2020, 12:15 AM

#

the former is usually faster in my experience @safe tapir , unless you're chaining a lot of them together

merry fern Sep 23, 2020, 12:25 AM

#

@merry fern but why?
@velvet thorn so that when I have no results, it doesnt say "empty" it gives a message.

velvet thorn Sep 23, 2020, 12:26 AM

#

I...don't know if that's worth ti

merry fern Sep 23, 2020, 12:26 AM

#

yea, exploring the idea. not too familiar with classes but my next step is thinking about class objects i think...

#

also, thanks for your help again, i was able to get my code working! im going to try to learn flask now to render it to a web display

velvet thorn Sep 23, 2020, 12:27 AM

#

I think this is a case where

#

you should recalibrate your brain to understand that "empty DF" -> "no data"

merry fern Sep 23, 2020, 12:27 AM

#

its not for me, its for when the system looks to output the data. but i can also just do a conditional output

desert oar Sep 23, 2020, 12:27 AM

#

or just write a pretty printing function

#

subclassing is intrusive

merry fern Sep 23, 2020, 12:27 AM

#

^^yes

velvet thorn Sep 23, 2020, 12:28 AM

#

or just write a pretty printing function
@desert oar then this

wheat pilot Sep 23, 2020, 1:22 AM

#

im trying to run this pd.DataFrame(xFeat).columns essentially but i get an attribute error 'numpy.ndarray' object has no attribute 'columns' but i dont understand why since im creating it as a dataframe in the same snippet of code

desert oar Sep 23, 2020, 1:51 AM

#

@wheat pilot can you share more of your code? maybe the error is from a different place in the code

#

and can you share the full error traceback?

wheat pilot Sep 23, 2020, 2:00 AM

#

i have everything in #help-cherries

#

although i may have a temporary solution for that error

#

let me know if what ive changed is not going to bite me in the butt later if you have a minute

cerulean ingot Sep 23, 2020, 6:57 AM

#

how to fetch sheet name of csv?

#

im using pandas.dataframe

#

to read csv

#

can anyone help?

velvet thorn Sep 23, 2020, 7:18 AM

#

how to fetch sheet name of csv?
@cerulean ingot what do you mean "sheet name"

cerulean ingot Sep 23, 2020, 7:23 AM

#

this thing in bottom of excel or csv sheet

📎 20200923_125128.jpg

velvet thorn Sep 23, 2020, 7:25 AM

#

this thing in bottom of excel or csv sheet
@cerulean ingot CSVs don't have those

#

at least, not standard CSVs

#

only Excel files

cerulean ingot Sep 23, 2020, 7:25 AM

#

and what about excel?

#

in excel how to read this name

velvet thorn Sep 23, 2020, 7:26 AM

#

did you Google it?

#

I suggest you do, because I found the answer on the first try...

cerulean ingot Sep 23, 2020, 7:28 AM

#

no 😀 I was doing csv then you said no so directly asked

#

@velvet thorn thanks

velvet thorn Sep 23, 2020, 7:29 AM

#

yw but I didn't really do anything

cerulean ingot Sep 23, 2020, 7:50 AM

#

I even appreciate reply 💯

cedar sky Sep 23, 2020, 9:10 AM

#

Hi I need help Installing tensorflow anyone online

hasty grail Sep 23, 2020, 9:11 AM

#

What issues are you running into?

cedar sky Sep 23, 2020, 9:12 AM

#

Helllo @hasty grail I need it for the tensorflow google examination

#

It shows msv or something is not installed

#

Can I share my screen and can you help me

hasty grail Sep 23, 2020, 9:13 AM

#

just paste the error here

#

!pastebin

arctic wedgeBOT Sep 23, 2020, 9:13 AM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pydis.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

cedar sky Sep 23, 2020, 9:13 AM

#

ok one sec I will open pycharm

#

Traceback (most recent call last):
File "C:/Users/HariAkash/PycharmProjects/TF_Config/venv/tf_first.py", line 1, in <module>
import tensorflow
File "C:\Users\HariAkash\PycharmProjects\TF_Config\venv\lib\site-packages\tensorflow_init_.py", line 41, in <module>
from tensorflow.python.tools import module_util as module_util
File "C:\Users\HariAkash\PycharmProjects\TF_Config\venv\lib\site-packages\tensorflow\python_init.py", line 39, in <module>
from tensorflow.python import pywrap_tensorflow as _pywrap_tensorflow
File "C:\Users\HariAkash\PycharmProjects\TF_Config\venv\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 28, in <module>
self_check.preload_check()
File "C:\Users\HariAkash\PycharmProjects\TF_Config\venv\lib\site-packages\tensorflow\python\platform\self_check.py", line 61, in preload_check
% " or ".join(missing))
ImportError: Could not find the DLL(s) 'msvcp140_1.dll'. TensorFlow requires that these DLLs be installed in a directory that is named in your %PATH% environment variable. You may install these DLLs by downloading "Microsoft C++ Redistributable for Visual Studio 2015, 2017 and 2019" for your platform from this URL: https://support.microsoft.com/help/2977003/the-latest-supported-visual-c-downloads

#

This is the error

#

I am just 14 currently so I find it a bit difficult can you guide me

#

Hello @hasty grail are you there

hasty grail Sep 23, 2020, 9:16 AM

#

Follow the link and the instructions on that page

cedar sky Sep 23, 2020, 9:16 AM

#

It doesn't seem to help

hasty grail Sep 23, 2020, 9:17 AM

#

What have you done exactly?

cedar sky Sep 23, 2020, 9:17 AM

#

I just entered: import tensorflow

hasty grail Sep 23, 2020, 9:17 AM

#

ImportError: Could not find the DLL(s) 'msvcp140_1.dll'. TensorFlow requires that these DLLs be installed in a directory that is named in your %PATH% environment variable. You may install these DLLs by downloading "Microsoft C++ Redistributable for Visual Studio 2015, 2017 and 2019" for your platform from this URL: https://support.microsoft.com/help/2977003/the-latest-supported-visual-c-downloads

#

Have you completed this step?

cedar sky Sep 23, 2020, 9:18 AM

#

I installed it many times byt it still shows the same error

hasty grail Sep 23, 2020, 9:18 AM

#

Which file did you download?

cedar sky Sep 23, 2020, 9:19 AM

#

Shall I share my screen

#

and can you help me

hasty grail Sep 23, 2020, 9:19 AM

#

Sorry I won't be on PC for long

cedar sky Sep 23, 2020, 9:19 AM

#

oh ok no problem

#

Which file did you download?
@hasty grail microsoft redistributable 2019

hasty grail Sep 23, 2020, 9:21 AM

#

Do you have to use any of your own files?

#

I feel that you might be better off simply using a Docker container otherwise

cedar sky Sep 23, 2020, 9:21 AM

#

📎 unknown.png

#

I feel that you might be better off simply using a Docker container otherwise
@hasty grail You need to use pycharm for the exam

#

I usually use Google Colab

hasty grail Sep 23, 2020, 9:22 AM

#

ahh

#

what's your Python version

cedar sky Sep 23, 2020, 9:24 AM

#

3.7 which is the one supported for the exam

hasty grail Sep 23, 2020, 9:26 AM

#

can you open your Visual Studio Installer and check whether you can find the C++ Redistributable?

cedar sky Sep 23, 2020, 9:27 AM

#

let me tryt

#

just a sec

hasty grail Sep 23, 2020, 9:27 AM

#

haven't used the installer for a while but I think you should be able to select what packages for VS to install on there

#

look under C++

cedar sky Sep 23, 2020, 9:27 AM

#

ok

#

two mins

#

📎 unknown.png

#

This is what it shows

hasty grail Sep 23, 2020, 9:31 AM

#

that's not the installer

#

that is visual studio itself

#

go to your Start Menu and search Visual Studio Installer or something

cedar sky Sep 23, 2020, 9:32 AM

#

📎 unknown.png

hasty grail Sep 23, 2020, 9:32 AM

#

yeah that one

cedar sky Sep 23, 2020, 9:32 AM

#

This one ???

hasty grail Sep 23, 2020, 9:32 AM

#

go to the "Installed" tab

cedar sky Sep 23, 2020, 9:32 AM

#

ok'

#

only visual studio community edition is there

#

📎 unknown.png

hasty grail Sep 23, 2020, 9:34 AM

#

click "more"

cedar sky Sep 23, 2020, 9:34 AM

#

ok

#

then

hasty grail Sep 23, 2020, 9:35 AM

#

what do you see now

cedar sky Sep 23, 2020, 9:35 AM

#

wait let me send

#

📎 unknown.png

#

What to do next

hasty grail Sep 23, 2020, 9:37 AM

#

modify

cedar sky Sep 23, 2020, 9:37 AM

#

kk

#

📎 unknown.png

hasty grail Sep 23, 2020, 9:38 AM

#

go to "Language packs"

cedar sky Sep 23, 2020, 9:38 AM

#

ok

#

That just shows languages like 'English','Tamil,'Japanese'

hasty grail Sep 23, 2020, 9:40 AM

#

hmm ok