#data-science-and-ml

1 messages · Page 254 of 1

velvet thorn
#

assume one row is of shape (5,)

#

options would be of shape, say, (5, 13)

#

you can use broadcasting there

#

and then take .min()

serene scaffold
#

broadcasting?

velvet thorn
#

hold up, though

#

why are you using mean() instead of sum()

#

I see it doesn't matter in this case

#

just curious

serene scaffold
#

maybe I should use sum

#

I can check the assignment again, but I won't ask you about that to keep my question focused on numpy.

#

(to eliminate the pretense that you're doing my homework)

#

I think it needs to be mean though because for the manhattan distance, in this case, we ignore columns where either has a nan

#

so rows with fewer nans get a much higher chance to be closer.

velvet thorn
#

I think it needs to be mean though because for the manhattan distance, in this case, we ignore columns where either has a nan
if that's your intention, then fair enough

#

so rows with fewer nans get a much higher chance to be closer.
@serene scaffold why do you say this, though?

#

because assuming all columns are similarly distributed, the means would also have similar distributions, regardless of the number of non-null values in each row

#

if I understand you correctly

serene scaffold
#

I think the dataset is fake so I don't know if the distribution of nans is meaningful.

velvet thorn
#

maybe I should use sum
@serene scaffold the thing is I'm not sure the question "what is the Manhattan distance between two partially defined points" has a well-defined answer

#

to use a practical example...say you are at (0, 0) on a map. which is closer to you, (3, ?), or (1, 4)?

#

of course, this is more a question of statistical theory than programming

serene scaffold
#

right

velvet thorn
#

and I don't claim to know the intentions of whomever set you the homework

#

just thinking out loud

#

anyway, back to your question

serene scaffold
#

I can

#

this homework assignment is a trick to show how long data science programs take

#

a lot of people were emailing the TA saying that their code was running overnight

#

and he finally said "that's the point"

velvet thorn
#
>>> a = np.random.chisquare(5, size=(5, 3))
>>> a.shape
(5, 3)
>>> a.mean(axis=0).shape
(3,)
>>> a - a.mean(axis=0)
array([[-1.60512055, -1.8917574 , -0.87419636],
       [ 2.46631863,  0.17530674, -0.26093926],
       [ 4.48044859, -0.56073539, -2.20589024],
       [-2.02807185,  0.35415127,  0.15569975],
       [-3.31357484,  1.92303477,  3.18532612]])
#

this is broadcasting

#

note the shapes of the arrays

serene scaffold
#

!e

import numpy as np
print(np.random.chisquare(5, size=(5, 3)))
arctic wedgeBOT
#

@serene scaffold :white_check_mark: Your eval job has completed with return code 0.

001 | [[ 3.17295805  5.87307768  5.60451568]
002 |  [10.90956275  2.15776996  9.61003041]
003 |  [ 4.06197492  5.80642425  6.10925965]
004 |  [ 3.45856202  1.74811126 10.31556199]
005 |  [ 7.84383268  5.67000452  4.59208444]]
velvet thorn
#

the smaller array gets "copied" across the axes

serene scaffold
#

I don't know that I understand what math is happening here

velvet thorn
#

okay, so first you have an array of random numbers, right?

serene scaffold
#

I'm not sure what the lone 5 is for but I see this has a shape of 5 by 3

velvet thorn
#

where each row is one sample, and each column is one class of observations.

serene scaffold
#

and I'll accept that it's random

velvet thorn
#

so for example, one row could be an individual person's data, and one column could be all the, say, heights of the people in the dataset.

#

now, say we want to take the mean of each column; that would be a.mean(axis=0)

#

which would give us an array of shape (3,) (one mean per column)

serene scaffold
#

right

velvet thorn
#

next, we want to subtract that mean from each value in the original dataset, matching columns

#

so subtract the first mean from every value in the first column, second mean from every value in the second column, etc.

rustic apex
#

What parts of Numpy should you concentrate on? Also, what’s the difference between Numpy and Pandas? It seems like allot ends up relying on pandas, so, should you use/learn both at once?

velvet thorn
#

What parts of Numpy should you concentrate on? Also, what’s the difference between Numpy and Pandas? It seems like allot ends up relying on pandas, so, should you use/learn both at once?
@rustic apex pandas is built on numpy.

#

it depends on what you want to do.

#

pandas is more for higher-level data wrangling.

serene scaffold
#

@rustic apex numpy is specifically for math and especially linear algebra, pandas is more for tabular data in general.

velvet thorn
#

in particular, data that comes in 2D form.

#

numpy allows you to work in higher dimensions and perform lower-level operations that often don't make sense on tabular data, such as taking the outer product, matrix product, applying functions across strides, etc.

#

so subtract the first mean from every value in the first column, second mean from every value in the second column, etc.
@velvet thorn but remember your original array has shape (5, 3), and your means have shape (3,)

#

which means that a - a.mean(axis=0) shouldn't work (shape mismatch) and yet it does.

#

that's because numpy recognises, in a nutshell, that the arrays match along one axis (the last one) and for every other axis, at least one array has either length 1 or is missing that axis.

#

so it implicitly "expands" the array of means to the shape (5, 3) (conceptually) before performing the elementwise subtraction

rustic apex
#

Thank you, I’m interested in using CSV and predicting data. I’m just getting started, it’s clicking with me well

velvet thorn
#

Thank you, I’m interested in using CSV and predicting data. I’m just getting started, it’s clicking with me well
@rustic apex "using CSV"?

#

what do you mean by that?

rustic apex
#

The CSV file?

serene scaffold
#

CSV is just comma separated values

rustic apex
#

I mean that as in just using data

merry fern
#

pandas dataframes...@velvet thorn this is what I got so far with trying to create a new column value based on existing columns, but I can't figure out how to get it to work.

def admin_mapper(df):
    if df['Name'].str.startswith('RP', na=False) or df['Name'].str.startswith('RV', na=False) and df['Price'] == 0:
        return 'Repurchase Agreement'
    elif df['Name'].str.startswith('BUY', na=False) or df['Name'].str.startswith('SELL', na=False):
        return 'CDS'
    elif df['Price'] != "0":
        return 'Bond'
df_admin['Type'] = df_admin.apply(lambda row: admin_mapper(df_admin), axis=1)
velvet thorn
#

saying "using CSV" is kind of like "I want to learn driving and I'm excited to use 95 octane petrol"

#

@merry fern yes, I saw your message earlier

#

I'll get to it later.

#

also, you didn't answer my question.

rustic apex
#

Haha, yea I mean using data and showing predictions

velvet thorn
#

Haha, yea I mean using data and showing predictions
@rustic apex fair enough

merry fern
#

thanks, i just updated it. I am not sure where I am going to go with this in terms of other values...

velvet thorn
#

I would say focus on pandas, but understand numpy

#

thanks, i just updated it. I am not sure where I am going to go with this in terms of other values...
@merry fern you could return a placeholder or None first.

#

anyway, your code looks more or less right

#

but replace or with |

rustic apex
#

How well should you be at what type of math? I taught myself Trig and Pre Cal

serene scaffold
#

@velvet thorn I'm not sure I see the application. The manhattan distance formula I'm using ignores indices for which either is nan

merry fern
#

ah, and with & ?

#

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

serene scaffold
#

so I assume it requires individual calls to manhattan_distance each time

velvet thorn
#

so I assume it requires individual calls to manhattan_distance each time
@serene scaffold you could do it that way, I guess?

#

it would be slower and kind of bad practice

#

so I would suggest using vectorisation in combination with np.nanmean instead

#

then you could entirely cut out the inner loop

serene scaffold
#

is there a way to use numpy to get some performance advantage in that case?

velvet thorn
#

ah, and with & ?
@merry fern oh wait, I misread your code

#

okay, you know one thing you can do?

#

make a subsidiary DataFrame of conditions

#

then play with that.

#

e.g.

#
columns = ['starts_with_rp', 'starts_with_rv', 'starts_with_buy', 'starts_with_sell', 'zero_price']
conditions_df = pd.concat([
    df['Name'].str.startswith('RP', na=False),
    df['Name'].str.startswith('RV', na=False),
    df['Name'].str.startswith('BUY', na=False),
    df['Name'].str.startswith('SELL', na=False),
    df['Price'] == 0,
], axis=1).rename(columns=columns)
#

then you can just use masking to assign

#

e.g. df[~conditions_df['Price']] = 'Bond' would take the place of your last condition

#

do it in reverse order of priority

#

is there a way to use numpy to get some performance advantage in that case?
@serene scaffold yup, vectorisation

merry fern
#

yikes

velvet thorn
#

this kind of uses concepts that you probably haven't been exposed to, which makes it all seem a bit abstract

#

like broadcasting

#

but the main idea is that broadcasting, in general, takes the place of numpy in loops

#

however, from a distance...

#

can you see how using nanmean might help?

#

yikes
@merry fern do you see what I mean?

merry fern
#

im trying to interpret it and understand its use case now

velvet thorn
#

okay.

#

so basically

merry fern
#

would you then combine conditions_df w/ original df ?

velvet thorn
#

that creates a new DataFrame with 5 columns

#

and the same index as the original DataFrame

#

and crucially, because of that, you can use it to index the original DataFrame.

#

how about you run it first

#

and look at the result

merry fern
#

why is it important that you are re-indexing the dataframe?

#

ok

velvet thorn
#

why is it important that you are re-indexing the dataframe?
@merry fern not re-indexing

#

the point is, for example

#

you can do df[conditions_df['starts_with_rp']] to give you all the rows in the original DF where the Name column starts with 'RP'.

#

do you see the utility of that?

merry fern
#

absolutely

serene scaffold
#

matrix[~np.isnan(matrix[:, j])]

velvet thorn
#

and you can combine that with other columns of conditions_df

serene scaffold
#

it doesn't work because the mask is a different shape

merry fern
#

essentially i was trying to figure out, how do i play with the dataframe now that ive gathered the data

velvet thorn
#

to access the rows that you want

merry fern
#

right

#

columns=columns is throwing an error

velvet thorn
#

oh, right, that won't work

#

uh

#

columns=dict(enumerate(columns)) should work

merry fern
#

pycharm saying:
Unexpected type(s):(enumerate[str])Possible types:(Mapping)(Iterable[Tuple[Any, Any]])

velvet thorn
#

matrix[~np.isnan(matrix[:, j])]
@serene scaffold huh. it doesn't...?

serene scaffold
#
IndexError: boolean index did not match indexed array along dimension 1; dimension is 10 but corresponding boolean dimension is 1```
velvet thorn
#

pycharm saying:
Unexpected type(s):(enumerate[str])Possible types:(Mapping)(Iterable[Tuple[Any, Any]])
@merry fern that shouldn't be a problem

#
IndexError: boolean index did not match indexed array along dimension 1; dimension is 10 but corresponding boolean dimension is 1```

@serene scaffold matrix is 2D, right?

serene scaffold
#

let's be sure, I guess

#

yeah it is

velvet thorn
#

that...shouldn't be happening

serene scaffold
#

(~np.isnan(matrix[:, j])).shape is (8000,1)

merry fern
#

hmm. KeyError: 'starts_with_rp'

velvet thorn
#

hmm. KeyError: 'starts_with_rp'
@merry fern that means the columns weren't renamed properly

#

just assign conditions_df.columns = columns I guess

#

so much for using rename 🤷‍♂️

#

(~np.isnan(matrix[:, j])).shape is (8000,1)
@serene scaffold that's not right

#

it should be shape (8000,)

serene scaffold
#

why

velvet thorn
#

because you're indexing on the columns

#

so that dimension should disappear

#

if matrix is 2D, matrix[:, j] should be 1D, assuming j is an int

#
>>> a = np.zeros(shape=(8000, 10))
>>> a[:, 1].shape
(8000,)
merry fern
#

@velvet thorn, so your code made a list of rules, then made a new df w/ the conditions, then renamed the columns in the same order as the rules, right/

serene scaffold
#

idk what to say

velvet thorn
#

@velvet thorn, so your code made a list of rules, then made a new df w/ the conditions, then renamed the columns in the same order as the rules, right/
@merry fern yup

#

idk what to say
@serene scaffold you checked matrix.shape, right?

serene scaffold
#

yes

velvet thorn
#

okay, last resort

#

what version is your numpy

serene scaffold
#

let's see

#

1.19.2

velvet thorn
#

okay, I'm stumped

serene scaffold
#

😮

#

🤯

velvet thorn
#

the best guess I can make is that somewhere matrix is getting an extra axis

#

because that wouldn't make sense otherwise

#

if you slice, one axis disappears

#

this is like the most fundamental thing in numpy

#

actually

#

OH

#

I know why now

#

it's because np.where returns a tuple

#

so for j in... is taking the first element of that tuple, which is an array

#

so j is not in fact an int, which leads to a 2D array upon slicing

#
>>> np.where([1, 0, 1])
(array([0, 2]),)
slender nymph
#

I have to find the latest date of the tickers (stock of companies) that have gone bankrupt. I actually want to have all those who no longer publish themselves on the market.

you will see in this table there are a lot of stocks. some no longer publish because they have made fallite . I'd like to have all those who stopped publishing

i know they are 7391 of those who stopped publishing

serene scaffold
#

@velvet thorn oh here's something

velvet thorn
#

I have to find the latest date of the tickers (stock of companies) that have gone bankrupt. I actually want to have all those who no longer publish themselves on the market.

you will see in this table there are a lot of stocks. some no longer publish because they have made fallite . I'd like to have all those who stopped publishing

i know they are 7391 of those who stopped publishing
@slender nymph define "stopped publishing"

slender nymph
serene scaffold
#

j is an ndarray

#

that's weird

velvet thorn
#

yeah

#

that's what I said above actually

serene scaffold
#

hmm

velvet thorn
#

right before you said that

serene scaffold
#

oh, I didn't see it because I have discord in a small window

#

and the other user made a comment

slender nymph
#

how can resolve that

#

someone can help me to start

serene scaffold
#

well it's running @velvet thorn

velvet thorn
#

yeah, we came to the same realisation

#

anyway, that's why there was that problem

serene scaffold
#

so I guess we'll see tomorrow if it worked

velvet thorn
#

I forgot about np.where

merry fern
#

@velvet thorn, i guess i can't do this:

conditions_df = pd.concat([
    df_admin['Name'].str.startswith('RP', na=False) | df_admin['Name'].str.startswith('RV', na=False),
    df_admin['Name'].str.startswith('BUY', na=False) | df_admin['Name'].str.startswith('SELL', na=False),
    df_admin['Price'] != 0,
], axis=1)

multiple conditions in there

velvet thorn
#

you could, but the point is to generate the conditions individually

#

and then use them to create another column

#

that is fine too, though

merry fern
#

it didnt work... but i see what youre saying

#

bc then i can reference the various characteristics

velvet thorn
#

why didn't it work

#

it should work

merry fern
#

when i do my mod the files come out empty

velvet thorn
#

your conditions are wrong, I think?

merry fern
#

df_admin[conditions_df[['starts_with_rp', 'starts_with_rv', 'starts_with_buy', 'starts_with_sell', 'zero_price']]].to_csv(r'csv_filteroutput' + fntime + '.csv')
anything wrong there /

#

i added a timestamp variable so i can keep running it without having to close the files 🙂

#

if i can DM u ill send u files

velvet thorn
#

uh...

#

not sure waht you're trying to do there

merry fern
#

i was trying to export all the values in that second list

#

but i guess i can just do conditions

velvet thorn
#

conditions_df[['starts_with_rp', 'starts_with_rv', 'starts_with_buy', 'starts_with_sell', 'zero_price']].to_csv()?

#

why are you indexing and then saving

merry fern
#

yea i have no idea what i was doing there haha

#

i think i left in df_admin accidentally

#

i wanted to see what was in conditions_df, but pycharm limits it to 5 rows, wasn't sure how to print it verbosely

#

@velvet thorn i see now that df_conditions creates a df of bools? interesting. how did you learn python/coding? whats your history

velvet thorn
#

uh

#

I went for a coding bootcamp

#

then I worked at a startup for a few months

#

then I worked at another startup for a few months

#

then I worked as a data science instructor for a few months

#

now I'm building my own startup

merry fern
#

i guess more what i was asking was when/how did you make the most advancement in your knowledge?

was it all comapny based projects or on your own?

#

whats your startup?

velvet thorn
#

by doing stuff

#

think of something cool to build, build it

#

I learn by doing

#

whats your startup?
@merry fern edutech related, basically

merry fern
#

me too. thats what im trying to do now, but keep stumbling

#

cool

rustic apex
#

What are some topics to focus on while learning Numpy?

merry fern
#

so now I need to write a mapper to utilize conditions_df, or how woudl you approach this?

velvet thorn
#

What are some topics to focus on while learning Numpy?
@rustic apex don't think of topics.

#

think of things you wanna do

#

(IMO)

#

so now I need to write a mapper to utilize conditions_df, or how woudl you approach this?
@merry fern well.

#

the CLEANEST way

#

would be to create a third DataFrame

#

and join the two

#

that would be my preferred approach.

#

do you understand the concept of "join"?

#

like in the SQL sense

merry fern
#

the CLEANEST way
@velvet thorn THIS is where im getting caught up - i keep trying to figure out if what im doing is even efficient or standard

#

yes

velvet thorn
#

well

#

you need to have a certain kind of mind

#

to get things intuitively, I guess?

#

but hard work works too

#

gotta understand your strengths and weaknesses

merry fern
#

i recently read that iterating is bad, and vectorization is the way to go

velvet thorn
#

anyway, you can think of your problem

#

as basically

#

joining on the condition columns.

#

do you understand why?

#

@rustic apex like of course you should understand concepts like vectorisation, broadcasting and indexing...

merry fern
#

no i cant really, what do you mean "joining on"

velvet thorn
#

...but studying them in isolation is likely to confuse you.

rustic apex
#

@velvet thorn well, Pandas makes sense with loading files and how to display certain things, but I’ve just gotten used to arrays with Numpy

velvet thorn
#

so it's better to find things that you need to do

#

and learn how to do them with numpy.

#

no i cant really, what do you mean "joining on"
@merry fern okay

#

so

#

conceptually

#

what you want to do

#

okay never mind

#

you understand

#

what a "join" is, right?

#

and you know you join on columns?

merry fern
#

im familiar with outer/inner join, but i am a newb @ getting my hands dirty with handling data like this

rustic apex
#

I understand it a little

merry fern
#

@velvet thorn The problem: I need to go through each row in a database & check 2 columns' values. based on those 2 columns' values, I need to create a new column and give it a value of "1,2,3" - But, yes, you are correct where I think you were going w/ your question before. Ideally, I'd like to be able to digest and act on a variety of data that comes in. I just started with this 3 example because its more than 2 (True/False)

I always just think of for loops to do this but that's not exactly python's way, right?

#

@velvet thorn - I see, both dataframes retained identical indices

velvet thorn
#

@merry fern okay, I need to go, but left joins are the solution to your problem

#

you need a third DataFrame to represent the correspondence of the various conditions to your final output value, with, of course, one row per output value (so 3 in this case)

#

then you join that to conditions_df, with conditions_df on the left

#

think about how that would work

merry fern
#

think about how that would work
@velvet thorn yes i was just working on that

#

thanks for the help

serene scaffold
#
  return _methods._mean(a, axis=axis, dtype=dtype,
C:\development\school\data_science\venv\lib\site-packages\numpy\core\_methods.py:170: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)```
#

this is the whole traceback that I get

#

so I don't even know what in my code is causing this error

#

the line before it is an expected print statement

#

huh, I guess it's just a warning

#

wish it gave me more info though

#
def manhattan_distance(x: np.array, y: np.array) -> np.float:
    not_null = ~np.isnan(x) & ~np.isnan(y) & BOOL_MASK
    x, y = x[not_null], y[not_null]
    return np.mean(np.absolute(x - y))

I'm wondering if using functools.lru_cache will speed this up

#

hashing the two arrays might take too long

#

eh well I guess you can't do that anyway

hasty grail
#

I think you can apply the mask after calculating the absolute value, that way you don't have to do it twice

serene scaffold
#

@hasty grail I have to apply the mask first, but the problem is that I have it written in such a way that this function will likely get called on the same pair of arrays a couple times

#

though that number is at most ten

velvet thorn
#

this is the mapping
@merry fern looks good

#

so I don't even know what in my code is causing this error
@serene scaffold it says "mean of empty slice"

#

and you have only one call to .mean, so...

#
def manhattan_distance(x: np.array, y: np.array) -> np.float:
    not_null = ~np.isnan(x) & ~np.isnan(y) & BOOL_MASK
    x, y = x[not_null], y[not_null]
    return np.mean(np.absolute(x - y))

I'm wondering if using functools.lru_cache will speed this up
@serene scaffold you don't wanna use np.nanmean?

serene scaffold
#

@velvet thorn well, I already finished it and got the results, so it's now the TA's problem.

#

I'm not sure what nanmean would afford me in this context?

velvet thorn
#

just np.nanmean(np.absolute(x - y)[BOOL_MASK]) (whatever BOOL_MASK is)

#

without masking

serene scaffold
#

well I also need to mask the last row every time

#

that's what BOOL_MASK is for

velvet thorn
#

okay, so that then

serene scaffold
#

though I don't know that that's the bottleneck

#

because the problem is that this function can be called up to nine times for the same x, y

#

it's incredibly unlikely that it would ever reach nine

#

but I have the results

#

so... py_strong

slate hollow
#

hey so for tensorflow, in a book i read that you could create a model through the functional api through smth like this: py keras.layers.Dense(30, activation='relu')(input_)but then later in the book they define something like this: py def call(self, inputs): Z = inputs for layer in self.hidden: # hidden is a list of dense layers Z = layer(Z) return inputs + Z

#

so... what's with the inconsistency? (ping plz)

velvet thorn
#

though I don't know that that's the bottleneck
@serene scaffold the point is to remove the inner loop, I guess

#

using vectorisation and broadcasting, as discussed earlier

serene scaffold
#

I might look into that in the future, though for some reason this is the only programming assignment for this course.

velvet thorn
#

@slate hollow what do you mean...?

slate hollow
#

i mean in both things they're calling the layer

#

but one supposedly tells keras how to connect the layers, while the other applies the functions to the inputs

velvet thorn
#

return inputs + Z this is weird to me

#

not really sure what's up with that

slate hollow
#

oh um it's like some hypothetical addition thing

#

nothing wrong

velvet thorn
#

okay

#

then I don't really see the inconsistency...?

#

in both cases they're applying layers to layers

slate hollow
#

i mean

#

the first one takes another layer

#

the second takes an array or something

velvet thorn
#

yeah, it takes a number of layers

#

and it successively applies them

slate hollow
#

wait what

#

i thought it just took an array of numbers

velvet thorn
#

oh, wait, never mind

#

my bad I read wrongly

#

you're right

#

the for loop is the successive application

#

the addition is the last part

#

yeah, it takes a number of layers
@velvet thorn the layers are in self.layers, whatever that is

slate hollow
#

yeah..?

#

1094/1094 [==============================] - 11s 10ms/step - loss: 0.5547 - accuracy: 0.8070 - val_loss: 1.9851 - val_accuracy: 0.4737
pain

limpid oak
#

I have coloumn values like '6 months', 'months 12' , any method to extract int values from it

velvet thorn
#

I have coloumn values like '6 months', 'months 12' , any method to extract int values from it
@limpid oak df['column_name'].str.extract(r'\d+') should work

limpid oak
#

thank you @velvet thorn , let me try

velvet thorn
#

note that

#

you still need

#

to convert to int if you want to use it as an int

limpid oak
#

@velvet thorn ValueError: pattern contains no capture groups

velvet thorn
#

r'(\d+)'

limpid oak
#

IntValues = int(df['Months'].str.extract(r'\d+'))

#

throwing same error

odd yoke
#

You didn't replace the regex

limpid oak
#

?

odd yoke
#

r'(\d+)'

limpid oak
#

this works perfect IntValues = df['Months'].str.extract(r'(\d+)').astype(int)

#

thank you so much your kind support @velvet thorn and @odd yoke

odd yoke
#

How to get recognition for free 101

mild topaz
#

np.ndarray.shape
@hasty grail what i can do with this?

hasty grail
#

you get the image's height and width

fossil vapor
#

Hi everyone, what could be the reasons for kernel to appear dead while running a cell in jupyter notebook? I use a mac book pro, OS = OS X Yosmite vesion 10.10.5, 2.4 GHz intel core 2 duo.

limpid oak
#

it might taking too much memory to process

#

you can process your data in batches

fossil vapor
#

I tried processing in batches and got the same error message about the kernel

limpid oak
#

try it on google colab

fossil vapor
#

Will try that, Thanks

limpid oak
#

welcome

mild topaz
#

Traceback (most recent call last):

File "E:\paymentz\template.py", line 8, in <module>
np.ndarray.shape(img)

TypeError: 'getset_descriptor' object is not callable @hasty grail

hasty grail
#

img is a np.ndarray

#

so you just do img.shape

mild topaz
hasty grail
#

your link is not working

mild topaz
hasty grail
#

sure it's possible

mild topaz
#

if u find any tutorial please do share , i m also looking for this

#

sure it's possible
@hasty grail ok

hasty grail
#

what are your results with OpenCV?

mild topaz
#

(array([], dtype=int64), array([], dtype=int64))

hasty grail
#

do you know what that means?

mild topaz
#

actually i am doing different task now

#

see this @hasty grail
template matching

hasty grail
#

what are your results with OpenCV?

#

Weren't you trying to do template matching with it?

mild topaz
#

yes i was

hasty grail
#

and....

mild topaz
#

my team mate told me to change the task for now

hasty grail
#

ok...

mild topaz
#

yes

#

if u find any tutorial pls do share

hasty grail
#

Well idk what are you doing now

mild topaz
#

can i explain u my problem ?

#

i want to make a template

#

of documents

hasty grail
#

Just explain it here

mild topaz
#

i want to make similar to this @hasty grail

hasty grail
#

But that image shows template matching as well

#

How is it any different?

mild topaz
#

which image

hasty grail
#

The one you just linked

mild topaz
#

i know ,

#

but i need a tutorial

mild topaz
#

plese do share if you find any

tidal sonnet
#

I understand the first one... But why when theta1 = 0.5, that the point must go thru the 2,1 mark?

#

this is uhm...
Linear regression with one variable

smoky kite
#

@tidal sonnet Because if x=2, h(2) = 0.5 x 2 which is 1

hasty grail
#

Did you mix up x and y?

#

@mild topaz Sorry, I don't have experience in that specific area of OpenCV

tidal sonnet
#

POG!

#

Wait... but how do we know that x is 2 in that case?

#

I was not given a value for x... just told that if theta1 was 0.5, and theta0 was 0, that the line would have to go thru the (2,1) point :(

hasty grail
#

the line has an equation y = 1/2 * x

#

Therefore the point (2, 1) resides on that line by definition

#

Any point that fits that equation exactly lies on that line

tidal sonnet
#

I see... Thank you

merry fern
#

@velvet thorn, is this a good approach to creating that 3rd dataframe? its unclear to me how to create the Type column in the original dataframe using the conditions df you suggested and then this logic i put together

df_admin = pd.read_excel(
    filenames['admin'],
    sheets['admin'],
    header=0,
    usecols=[3, 5, 6, 7],
    names=['ISIN', 'Name', 'Price', 'Quantity']
)
columns = ['starts_with_rp', 'starts_with_rv', 'starts_with_buy', 'starts_with_sell', 'zero_price']
conditions_df = pd.concat([
    df_admin['Name'].str.startswith('RP', na=False),
    df_admin['Name'].str.startswith('RV', na=False),
    df_admin['Name'].str.startswith('BUY', na=False),
    df_admin['Name'].str.startswith('SELL', na=False),
    df_admin['Price'] != 0,
], axis=1)
conditions_df.columns = columns
type_mapping_data = {'Type': ['Repurchase Agreement', 'Repurchase Agreement', 'CDS', 'CDS', 'Bond'],
                     'starts_with_rp': [1, 0, 0, 0, 1],
                     'starts_with_rv': [0, 1, 0, 0, 1],
                     'starts_with_buy': [0, 0, 1, 0],
                     'starts_with_sell': [0, 0, 0, 1],
                     'zero_price': [0, 0, 0, 0, 0]
}
# Create the Type column in using df_admin, conditions_df.
#

I'm reading up on pandas.merge now

#

rather than simply joining 2 dfs, i suppose i am trying to join and create a column based on rules?

grave frost
#

can Anyone show me real quick how to iterate and replace all elements with some other ones in a column of a Pandas DataFrame that is memory efficient (I have 2.5 Million rows)?

I basically want to iterate over all elements in first row, and then pass each element through a function, the output (it would return the exact value to be put in place, not a variable) of the function to be put in place of the element...

#

I tried something like:-
for item in df[0]: item = numerise(item)

#

But it doesn't work

jaunty scroll
#

What is the function? I think using a lambda might work...just a suggestion and possibly not valid

grave frost
#

@jaunty scroll function is named numerise

#

Tried a for-loop with replace and it's been going on till now...

jaunty scroll
#

Yea I'm really not sure I just thought I'd offer that as a possibility

#

But a lambda should do basically the same thing here from what I can tell

grave frost
#

Can you provide some pseudocode on how I should accomplish that?

#

This is what I was using. the count is the control variable since I was thinking that it was stuck in an infinite loop:-
count = int(0) for item in DataFrame[int(Row)]: count += 1 DataFrame[Row].replace(item, numerise(item), inplace=True) if count == 2500000: break

tidal bough
#

I believe it's also possible to just apply a function to a column in-place.
EDIT: hmm, maybe not

grave frost
#

Well, I though that this kind of stuff would be easy in Pandas. I can iterate over the required values, but not replace them

jaunty scroll
#

trying to figure out how to represent this in a lambda, not sure if this is gonna work

grave frost
#

I restarted the whole thing, but still no luck. maybe it's not doing enything. Lemme debug it

jaunty scroll
#

but it would be something like output = lambda x, numerise: numerise(x)

#

I just don't know if it would support a single statement after the colon like that

grave frost
#

ohk, I tried with 25 values - Function works but it runs it 4 TIMES! so for 25 values it made it go 100 times. This sucks

#

for 2.5 Million rows, I will die

glacial rune
#

Is there a way to write to a csv file row by row without including \r\n? I'm importing that CSV file into a SQL database and it's including the \r\n I think

jaunty scroll
#

are you using the csv module?

glacial rune
#

yes

jaunty scroll
#

I am looking back at a script I wrote recently and I just used the w.writerow() command and didn't have to use \r\n

glacial rune
#
with open(csv_file, 'w', newline='') as out:
  csv_writer = csv.writer(out)
  csv_writer.writerows(data)
#

something like that

#

but when I open the raw .csv file in notepad++ there's a CR and LF at the end of each line

jaunty scroll
#

maybe just run rstrip? kind of a janky fix I guess but it might work

mild topaz
#

how i can do template matching on document images ? in python

grave frost
#

Plz can someone provide help on my problem?

merry fern
#

@grave frost you can try to get a help room

grave frost
#

@merry fern What's that?

merry fern
#

@merry fern What's that?
@grave frost on the left bar there is a section that says "python help: available" - you go into an available room and immediately post your question / code and someone will try to help

grave frost
#

So it's like a private room for only my problem? Nice feature!

merry fern
#

Correct

velvet thorn
#

@grave frost what is the function?

grave frost
#

@velvet thorn it is a function to convert the string to a numerical format (one-hot encoding)

velvet thorn
#

@merry fern you want to combine the original dataframe and conditions_df, then join the result on that onto your third dataframe

#

@velvet thorn it is a function to convert the string to a numerical format (one-hot encoding)
@grave frost show code.

#

and explain exactly what you want to do

merry fern
#

@merry fern you want to combine the original dataframe and conditions_df, then join the result on that onto your third dataframe
@velvet thorn not sure how to do that, but im making progress, bc i couldnt figure out the logic in joining dfs, i started playing with another method which works! again, im just not sure the best most efficient way to go about solving the problem :

conditions_admin = [
    df_admin['ISIN'].isnull,
    (((df_admin['Name'].str.startswith('RP', na=False)) | (df_admin['Name'].str.startswith('RV', na=False))) & (df_admin['Price'] == 0)),
    ((df_admin['Name'].str.startswith('BUY', na=False)) | (df_admin['Name'].str.startswith('SELL', na=False))),
    (((~df_admin['Name'].str.startswith('RP', na=False)) | (~df_admin['Name'].str.startswith('RV', na=False))) & (df_admin['Price'] != 0))
]
values_admin = ['Other', 'Repurchase Agreement', 'CDS', 'Bond']
df_admin['Type'] = np.select(conditions_admin, values_admin)
velvet thorn
#

sure, if that works for you, go ahead

grave frost
#

@velvet thorn Well I am able to do the one-hot encoding, then converting the values to float (since int gives some sort of error) but now the problem is coming in the training phase

merry fern
#

actually, the most recent addition, "isnull" is breaking it

velvet thorn
#

I'm not even sure if a join would be the most efficient, but I would say it is the most idiomatic

#

@velvet thorn Well I am able to do the one-hot encoding, then converting the values to float (since int gives some sort of error) but now the problem is coming in the training phase
@grave frost what's the problem

grave frost
#

`InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: indices[0,0] = 2147483647 is not in [0, 18)
[[node sequential/embedding/embedding_lookup (defined at <ipython-input-10-bf624c056173>:13) ]]
(1) Invalid argument: indices[0,0] = 2147483647 is not in [0, 18)
[[node sequential/embedding/embedding_lookup (defined at <ipython-input-10-bf624c056173>:13) ]]
[[Adam/Adam/update/AssignSubVariableOp/_25]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_3133]

Errors may have originated from an input operation.
Input Source operations connected to node sequential/embedding/embedding_lookup:
sequential/embedding/embedding_lookup/2299 (defined at /usr/lib/python3.6/contextlib.py:81)

Input Source operations connected to node sequential/embedding/embedding_lookup:
sequential/embedding/embedding_lookup/2299 (defined at /usr/lib/python3.6/contextlib.py:81)

Function call stack:
train_function -> train_function
`

hasty grail
#

looks like integer overflow to me

velvet thorn
#

...I feel like something is wrong with your data processing

grave frost
#

Hm.. so TF doesn't accept float values?

velvet thorn
#

it does

hasty grail
#

can you post your code?

velvet thorn
#

but I have no idea what your pipeline is like

#

or what your code looks like

#

which is why I said show code

grave frost
#

Like this 2.10640103e+37

#

ofc, wait a min

hasty grail
#

this is why I always add assertions to my pipelines so things like this don't happen

#
MIN_VAL, MAX_VAL = 0., 1.
tf.debugging.assert_greater_equal(dataset, MIN_VAL)
tf.debugging.assert_less_equal(dataset, MAX_VAL)
grave frost
#

`for chunk in pd.read_csv("/content/hash.txt", header=None,chunksize=25000000):
df = pd.concat([df, chunk], ignore_index=True)
df.head

from sklearn.model_selection import train_test_split
train, val = train_test_split(df, test_size=0.1)
print(len(train), 'train examples')
print(len(val), 'validation examples') #(train and val are both DF's')

Batch size

BATCH_SIZE = 1

BUFFER_SIZE = 10000

Length of the vocabulary in chars

vocab = ['\n', ',', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f'] #Vocab List goes here
vocab_size = len(vocab)+1

The embedding dimension

embedding_dim = 12000

dataset = tf.data.Dataset.from_tensor_slices((train[0].values.astype(float), train[1].values)).shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)

for feat, targ in dataset.take(5):
print ('Features: {}, Target: {}'.format(feat, targ))

val_dataset = tf.data.Dataset.from_tensor_slices((val[0].values.astype(float), val[1].values)).shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)

for feat, targ in val_dataset.take(5): #printing validation data
print ('Features_VAL: {}, Target_VAL: {}'.format(feat, targ))
`

hasty grail
#

use 3 backticks

velvet thorn
#

less @hasty grail

hasty grail
#

for the code block

grave frost
#

@hasty grailThanx for the tip

#
0    object
1     int64
dtype: object

This is the train DataFrame

#
    df = pd.concat([df, chunk], ignore_index=True)
df.head

from sklearn.model_selection import train_test_split
train, val = train_test_split(df, test_size=0.1)
print(len(train), 'train examples')
print(len(val), 'validation examples')  #(train and val are both DF's')

# Batch size
BATCH_SIZE = 1

BUFFER_SIZE = 10000

# Length of the vocabulary in chars
vocab = ['\n', ',', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f']     #Vocab List goes here
vocab_size = len(vocab)+1

# The embedding dimension
embedding_dim = 12000

dataset = tf.data.Dataset.from_tensor_slices((train[0].values.astype(float), train[1].values)).shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)

for feat, targ in dataset.take(5):
  print ('Features: {}, Target: {}'.format(feat, targ))

val_dataset = tf.data.Dataset.from_tensor_slices((val[0].values.astype(float), val[1].values)).shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)

for feat, targ in val_dataset.take(5):                 #printing validation data
  print ('Features_VAL: {}, Target_VAL: {}'.format(feat, targ))
#

@velvet thorn @hasty grail

velvet thorn
#

actually, the most recent addition, "isnull" is breaking it
@merry fern oaky let me explain the join approach

merry fern
#

i fixed it, but i woudl like to understand what you were talking about last night

hasty grail
#
for chunk in pd.read_csv("/content/hash.txt", header=None,chunksize=25000000):
    df = pd.concat([df, chunk], ignore_index=True)

Does this even work? You would be trying to concat None and a DataFrame together in the first iteration

grave frost
#

Well, df.head gives a good output

#

shld I reduce the chunkSize?

hasty grail
#

what does df look like?

#

There's no point using a chunksize since you're loading the entire thing into memory anyway

velvet thorn
#

@merry fern

#
>>> df
    cond_1  cond_2  cond_3
0    False    True    True
1    False    True    True
2    False   False   False
3    False    True    True
4     True    True   False
5     True   False   False
6     True    True   False
7     True    True   False
8    False    True   False
9     True    True    True
10    True   False    True
11   False   False    True
12    True    True    True
13    True    True   False
14   False   False    True
15    True   False    True
>>> indicator_df
   cond_1  cond_2  cond_3 category
0    True    True    True    CAT_A
1   False   False    True    CAT_B
2    True   False   False    CAT_C
>>> conditions = ['cond_1', 'cond_2', 'cond_3']
>>> pd.merge(df, indicator_df, how='left', left_on=conditions, right_on=conditions)
    cond_1  cond_2  cond_3 category
0    False    True    True      NaN
1    False    True    True      NaN
2    False   False   False      NaN
3    False    True    True      NaN
4     True    True   False      NaN
5     True   False   False    CAT_C
6     True    True   False      NaN
7     True    True   False      NaN
8    False    True   False      NaN
9     True    True    True    CAT_A
10    True   False    True      NaN
11   False   False    True    CAT_B
12    True    True    True    CAT_A
13    True    True   False      NaN
14   False   False    True    CAT_B
15    True   False    True      NaN
#

(sorry, I know there are two conversations going on and this is a huge chunk)

grave frost
#

@hasty grail When loading the entire thing in memory it doesn't load due to the lack of RAM

#
[2499999 rows x 2 columns]>
AND:-
0    object
1     int64
dtype: object
hasty grail
#

What object are they exactly?

grave frost
#

[0] is supposed to look like a long list of numbers : ``3790673563025180902423922202540363554017`

#

Dunno why object. I did try to convert but got error

#

So in the pipeline step I use .astype(float)

velvet thorn
#

that's too big for int64

grave frost
#

@velvet thorn well, it's a long string encoded in numbers

velvet thorn
#

might help if you explained your problem in detail

#

like why you're doing what you're doing

merry fern
#

@merry fern
@velvet thorn have to stop you for a second and thank you for chatting with me about this, between last night and today made some major leaps in processing the data. you're awesome! thank you

velvet thorn
#

yw 🙂

hasty grail
#

Can you set the dtype parameter to {'first_col': np.str, 'second_col': np.int32}?

grave frost
#

@velvet thorn I am making a seq2seq model that tries to find out the relationship b/w a string(encoded) and a number.

hasty grail
#

the long list should then be parsed as a string

merry fern
#

I just found that it helps to create a new python file with only the code I am working on to isolate that and then bring it back into the full file once it works accordingly

velvet thorn
#

anyway, the dataframes above should illustrate how to combine the conditions with the "indicator dataframe" (which is a mapping from conditions to intended category)

hasty grail
#

while the second column would be parsed as int32 so you don't have to cast it to int32 when using TF

velvet thorn
#

imagine also that df has additional columns which are your other features that the conditions are derived from (they won't be affected by the join)

grave frost
#

Wait how do I define the dtypes?

merry fern
#

im reading

hasty grail
#

it's a parameter in read_csv

velvet thorn
#

what's your pandas version @grave frost

merry fern
#

okay, i see what youre saying, but how do you then join the newly added column to the original data instead of to the conditions df?

grave frost
#

1.0.5

velvet thorn
#

okay, i see what youre saying, but how do you then join the newly added column to the original data instead of to the conditions df?
@merry fern "newly added"?

#

oh.

#

okay so basically df here is the concatenation (pd.concat) of the original DataFrame with your features and conditions_df

#

then you join that on indicator_df

merry fern
#

df doesn't contain bools, it contains strings and floats, so i dont follow how to merge them

velvet thorn
#

@hasty grail When loading the entire thing in memory it doesn't load due to the lack of RAM
@grave frost why don't you train with an iterator

#

that loads on demand?

#

df doesn't contain bools, it contains strings and floats, so i dont follow how to merge them
@merry fern you're only joining on the condition columns

#

which we produced earlier, remember?

#

those are bools

#

all other columns do not participate in the join, but merely appear in the result unchanged

hasty grail
#

@grave frost Actually, since you're using TF anyway

#

why not use this?

grave frost
#

@velvet thorn I don't know much about them. But whenever I load csv it will run out of memory

velvet thorn
#

@velvet thorn I don't know much about them. But whenever I load csv it will run out of memory
there's a way to load part of the dataset, train on that batch, then load another part, repeat

grave frost
#

@hasty grail I tried that, but again memory problem

hasty grail
#

it also supports shuffling so you can do that without using sklearn

#

did you set the buffer size?

grave frost
#

How much shld I set it to?

hasty grail
#

depends on how much memory you can afford

grave frost
#

12Gb

hasty grail
#

I think you can use the default buffer size when initing the dataset

#

but then use a larger buffer size for the shuffle method, which is in terms of number of elements

grave frost
#

Trying with 10

#

@hasty grail @velvet thorn Is this valid? looks empty to me <CsvDatasetV2 shapes: ((), ()), types: (tf.int64, tf.int64)>

hasty grail
#

you need to set a valid type for the columns

#

the x values are likely to overflow given how long they are

#

what do they even represent?

grave frost
#

so tf.float()?

#

<PaddedBatchDataset shapes: ((1,), (1,)), types: (tf.float32, tf.int64)>

hasty grail
#

[0] is supposed to look like a long list of numbers : 3790673563025180902423922202540363554017
Can you explain what this means?

grave frost
#

It's a string represented by numbers

hasty grail
#

so it's a string

grave frost
#

yep

#

converted

hasty grail
#

then you should specify the dtype as tf.string

#

right?

grave frost
#

Yeah, but I converted to int so that model can understand that

#

models work only on integers, right?

hasty grail
#

what are you trying to do?

grave frost
#

I am making a seq2seq model that tries to find out the relationship b/w a string(encoded) and a number.

hasty grail
#

oh

grave frost
#

and with that relationship predict some more numbers for the given encoded string

hasty grail
#

then you should transform that sequence of numbers into a sequence of one-hot encoded vectors

grave frost
#

Unfortunately, it's alphanumeric

hasty grail
#

it still holds

#

you can still one-hot encode it

grave frost
#

but what is the problem in numbers?

hasty grail
#

it's too big

grave frost
#

but a float is not

#

does TF automatically convert the tf.string to nums?

arctic wedgeBOT
#

You are not allowed to use that command here. Please use the #bot-commands channel instead.

desert oar
#

@hasty grail the bot doesnt have numpy installed

grave frost
#

NameError: name 'iiinfo' is not defined

hasty grail
#

It does, actually

pale thunder
#

I am pretty sure it does

desert oar
#

no kidding

#

!e ```python
import numpy as np
print( np.array([1,2,3]) )

arctic wedgeBOT
#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

[1 2 3]
desert oar
#

looks like im mistaken, thats very helpful to know

#

!e ```python
import pandas as pd
print(pd.Series([1,2,3]))

arctic wedgeBOT
#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

001 | 0    1
002 | 1    2
003 | 2    3
004 | dtype: int64
desert oar
#

😮

pale thunder
#

also networkx and forbiddenfruit

hasty grail
#
import numpy as np
print(np.iinfo(np.int64).max)
print(np.finfo(np.float64).max)
desert oar
#

!e ```python
import numpy as np
print(np.iinfo(np.int64).max)
print(np.finfo(np.float64).max)

arctic wedgeBOT
#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

001 | 9223372036854775807
002 | 1.7976931348623157e+308
grave frost
#

seems big enough for my purpose

desert oar
#

note that the numpy max isnt necessarily the same as the python max

hasty grail
#

what error function are you planning to use for this?

grave frost
#

Ok no it isn't

desert oar
#

why do you need numbers that are so large? what are you doing exactly?

pale thunder
#

you can also get unbounded integers in numpy with the object dtype and using python ints at a pretty severe performance hit

desert oar
#

i see some stuff about tensorflow above

grave frost
#

sparse_categorical_crossentropy

desert oar
#

that's returning numbers that are on the order of 1e300?

hasty grail
#

I am making a seq2seq model that tries to find out the relationship b/w a string(encoded) and a number.

#

Can you provide an example?

grave frost
#

9128165362313010342475980190211245832428 --> 2499996

desert oar
#

yeah i would need to see an example too. but if it's seq2seq i'd imagine you would want to encode the number as a string of digits rather than a number

#

are you the one who was trying to use ML to reverse hashes a while ago?

#

is that what these are?

grave frost
#

@desert oar do models work with tf.string?

#

yep

merry fern
#

so lets say i have 2 dataframes i want to combine, but in the new combined dataframe, i want each dataframe to have an index from its source...

desert oar
#

i dont know what that question is supposed to mean

hasty grail
#

The fact that you're using sparse_categorical_crossentropy implies that there's a (relatively) small set of output values

desert oar
#

@merry fern be careful, what does "combine" mean here?

hasty grail
#

But your output is 2499996...

grave frost
#

There is a small set

merry fern
#

@merry fern be careful, what does "combine" mean here?
@desert oar not concat, but to add the rows together

desert oar
#

@grave frost cross entropy is for categorical targets. you are not going to have 1 target for literally every real number integer

grave frost
#

It's just a combination of different classes including intergers

merry fern
#

so i think merge

hasty grail
#

Can you describe the set of outputs?

desert oar
#

@merry fern can you give an example of inputs and outputs for this? fake data is fine, maybe just 5 rows

grave frost
#

It will predict many classes in succesion right? (like 1 then 3 etc..)

desert oar
#

@grave frost can you give an example of more records here?

#

where are these digit strings coming from

hasty grail
#

what exactly is the relationship you're trying to model

desert oar
#

and where are the numbers coming from

grave frost
#

Of a hash and it's decoded value

desert oar
#

ok. and what is the space of all valid decoded values?

grave frost
#

but that's more confusing to explain

#

from 1-2.5 Million

desert oar
#

no it's not more confusing to explain

merry fern
hasty grail
#

how many possible in/out values are there?

desert oar
#

it sounds like 2.5 million outputs

hasty grail
#

that's definitely not a small set

desert oar
#

@merry fern so you want to stack them vertically?

grave frost
#

So which loss to use?

merry fern
#

@merry fern so you want to stack them vertically?
@desert oar right, to create a master list

im compiling differences between data sources, and then i want to have a master list of differences

#

without losing reference to where those differences originate

grave frost
#

So what exactly should I do to accomplish my goal?

hasty grail
#

assuming your decoded values are in the range of int64 you can use MAE

desert oar
#

!e ```python
import pandas as pd

data1 = pd.DataFrame({
'XYZ': {'ISIN': 123, 'Q': 100, 'P': 1},
'ABC': {'ISIN': 345, 'Q': 200, 'P': 2},
}).rename_axis(index='Type')

data2 = pd.DataFrame({
'XYZ': {'ISIN': 123, 'Q': 100, 'P': 1},
'ABC': {'ISIN': 345, 'Q': 200, 'P': 2},
}).rename_axis(index='Type')

data = pd.concat({'A': data1, 'B': data2})
print( data )

grave frost
#

@hasty grail mape?

arctic wedgeBOT
#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

001 |         XYZ  ABC
002 |   Type          
003 | A ISIN  123  345
004 |   Q     100  200
005 |   P       1    2
006 | B ISIN  123  345
007 |   Q     100  200
008 |   P       1    2
hasty grail
#

sorry, MAE

grave frost
#

Will that fix the error?

hasty grail
#

no

#

your inputs are fundamentally incorrect

desert oar
hasty grail
#

you need to input it as a sequence instead of a single number

desert oar
#

and yes 100% the input needs to be a sequence of digits and not a number

grave frost
#

SO a string?

desert oar
#

yes, a string

#

that just happens to contain digits

hasty grail
#

more like a sequence of one-hot encoded vectors

grave frost
#

so Tf.String is an acceptable dtype?

desert oar
#

but you still need to figure out how to convert said string to numbers

#

whether one-hot-encoded or something else

#

@merry fern did you see my example above? the important part is passing a dict do pd.concat

grave frost
#

But not the one long number?

desert oar
#

correct, definitely not one big number

hasty grail
#

one-hot is the simplest way to do it imo

desert oar
#

since you only have 10 individual characters in a string, one hot encoding is parsimonious and sensible

grave frost
#

alright, I will implement it. Anything else I need to keep in mind?

desert oar
#

like i said, use negative sampling

#

i linked a blog post about it above

#

it's what they use in "traditional" word2vec to improve training time

merry fern
#

@desert oar this part of the code is what i was looking for! data = pd.concat({'A': data1, 'B': data2})

desert oar
#

you could try your model as a regression model too @grave frost but im skeptical that it's the right thing here

#

@merry fern good. note that this will give you a multi-index on your dataframe, which increases the complexity level

grave frost
#

No regresion

hasty grail
#

some hashes are done such that the distribution is essentially random

grave frost
#

It is random. That's the whole point

desert oar
#

yeah i also think this project will go nowhere btw

#

might be a good exercise in learning tensorflow but

merry fern
#

yes thats what i meant, multi-index. the funny thing is when i output to csv, it puts the multi-index in every line

but in the python console, it just shows a header line and then goes forward, example:

Master Break List 
                                Quantity     Price
           Type ISIN                             
Int vs. PB Bond RU000A0JWHA4  7000000.0 -0.258750
                RU000A0ZYYN4 -3000000.0 -1.123458

vs.

desert oar
#

if you could reverse hashes with machine learning we'd have really big problems on our hands

hasty grail
#

yeah xD

merry fern
#

is that by design?

desert oar
#

@merry fern yes, that's intentional

merry fern
#

great

grave frost
#

I have to find a very Small relationship. Something that even bring the loss to 0.002 is outstanding

merry fern
#

awesome awesome awesome

grave frost
#

It's jsut a POC

hasty grail
#

0.2 would be outstanding already

grave frost
#

For another architecture I had in mind

desert oar
#

indeed, you are going to basically be depending on the PRNG being insecure

merry fern
#

i have made leaps and bounds in digesting excel data, mapping columns, creating new columns based off data, and organizing the output! thank you @desert oar @velvet thorn and everyone else

desert oar
#

if you can get the accuracy anywhere close to 5% i'll be surprised

#

@merry fern thats great to hear. the more you learn the easier (and more fun) it gets

merry fern
#

yes!

#

and hope to someday be able to sit in here and answer some questions too...

desert oar
#

you probably already can

grave frost
#

I am expecting val acc to be 0.004-ish

hasty grail
#

depending on how many possible outputs are there

grave frost
#

That's the most realistic

#

0.004%

desert oar
#

also here's a hot tip @merry fern :

#

!e ```python
import pandas as pd

data1 = pd.DataFrame({
'XYZ': {'ISIN': 123, 'Q': 100, 'P': 1},
'ABC': {'ISIN': 345, 'Q': 200, 'P': 2},
}).rename_axis(index='Type')

data2 = pd.DataFrame({
'XYZ': {'ISIN': 123, 'Q': 100, 'P': 1},
'ABC': {'ISIN': 345, 'Q': 200, 'P': 2},
}).rename_axis(index='Type')

data = pd.concat({'A': data1, 'B': data2}, names=['source'])
print( data )

arctic wedgeBOT
#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

001 |              XYZ  ABC
002 | source Type          
003 | A      ISIN  123  345
004 |        Q     100  200
005 |        P       1    2
006 | B      ISIN  123  345
007 |        Q     100  200
008 |        P       1    2
desert oar
#

look at the names of the multiindex when you use the names= parameter

#

nice little convenience feature

grave frost
#

Negative sampling drops out some inputs

#

*not consider them

desert oar
#

@merry fern otherwise you'd have to write

data = pd.concat({'A': data1, 'B': data2}).columns.rename('source', level=0)
merry fern
#

nice i just added that

desert oar
#

@grave frost yes, otherwise it would be computationally ridiculous to compute the weight update for all 2.5m outputs

grave frost
#

Alright, as long as it implement in a line or 2 (as if that ever happens)

desert oar
#

thats the wrong attitude

grave frost
#

But not gonna lose the optimism

#

🙂

desert oar
#

you use the methods that exist, not the tools you happen to have sitting right in front of you

#

if you want to do something, either you do it or you don't

#

try without negative sampling and see what happens

merry fern
#

leaving for now, need to workout i feel l like shit. ttyl

desert oar
#

if it works, then great

grave frost
#

ok, 1-hot, MAE and -ve sampling. anything else?

desert oar
#

but it will probably be dog slow

#

MAE is incompatible with classification

#

you know this

grave frost
#

No

desert oar
#

do you know what MAE stands for?

#

do you know how it is defined?

#

it literally does not make sense on a classification problem

#

it is for regression problems

#

if you do this as a regression problem, then yes MAE is valid

grave frost
#

Damn

#

and as a classification?

desert oar
#

accuracy, precision, recall, f1, ...

severe spindle
#

I would like to learn how to use machine learning algorithms, I have no prior experience with machine learning. Can someone recommend a good place to learn from?

desert oar
#

@severe spindle how much programming experience do you have?

grave frost
#

k, I will do classification

severe spindle
#

@desert oar Several years , I'm a third year computer science student at university

desert oar
#

using machine learning algorithms is not the right way to think about it. machine learning is a process that requires using these algorithms at some point

#

they dont make sense outside the context of actually doing some kind of prediction or other machine learning work

#

since you already know programming, the course https://fast.ai might be a good place to start

Making neural nets uncool again

#

they get you right into the sexy deep learning stuff

severe spindle
#

okay sounds really good, thanks for explaining a little too 😄

desert oar
#

if you want to make a career out of this (or do more than trivial hobby projects) you'll eventually want to go back and get into the foundational math and start learning stats as well

grave frost
#

Thanx a ton for your help guys and placing me in the right direction! 🙂 @desert oar @hasty grail @velvet thorn

hasty grail
#

np and... good luck

desert oar
#

you're welcome both, sorry i'm in a curt mood today but i just dont want people wasting their time on stuff that's not worth doing

grave frost
#

Nothing to worry about. I know it is hopeless, but I need a baseline

hasty grail
#

This might be a place to start

grave frost
#

It is actually a POC for establishing that there is bias in randomness. A base for a new architecture I am planning to develop.

hasty grail
#

first thing that showed up in my search

severe spindle
#

I'm taking some ML modules this year so I'm sure I'll have to hit the books for the maths at some point too but you;ve given me a good place to start, thanks

desert oar
#

there is bias in randomness
i think you might need to qualify this a bit. there is a lot of literature already on cryptography... like 50 years of it, written by very smart people

grave frost
#

NN's cannot predict plaintext out of encrypted text. THe best you can do is REDUCE the time taken to decrypt, which is what my theoretical architecture does. But right now its very naive and basic phase (too much assumptions - lesser testing). It would need people with PHD to make it...

#

@desert oar A simple assumption that anything random can be proved to be non-random given an enough complex function

#

That's is what basically linear models do, albeit on a very small scale.

#

There are several theoreoms and proofs on it

#

Good Night to all!

desert oar
#

i dont think thats what the universal approximation theorem actually says...

#

but good luck

merry fern
#

@desert oar next rabbit hole for me is rendering results in a web platform 😛

grave frost
#

@desert oar You don't need to understand any theoreoms for the basic idea- It's pretty intuititve in itself. Consider a series of numbers :- 0,2,4,6,8. Then, f(x)=x+2, for x in W. Now consider a relatively more complex series:- 0,1,4,9. Here, f(x)=x^2. For ANY given sequence of numbers, I can compute a corresponding function to represent that data. Assume you have very less intelligence than the average human, only enough to grasp basic arithmetic. then the second set of numbers might look like random numerals to you. But actually they are governed by a function having a complexity outside you understanding. A NN tries to approximate that function. So no matter how random set of numbers you can give, there will always be a relation. Now it might be that relation to be too complex to be computable with normal machines and would require quantum level power to compute, but there always be a relation. http://neuralnetworksanddeeplearning.com/chap4.html Look at this link for a more visual idea

elder creek
#

Hey all! Anyone able to answer a question about calling data from a linear regression model (OLS) generated by statsmodels?

#

I'm tryin to get a list of the pvalues greater than .05, using:

print('P Values: ', model.pvalues > .05)

This returns a df showing every row and the Boolean value for the pvalue of each row. I'm new to python (and programming), so I don't know a lot of things that should be obvious.

strong osprey
#

Hey guys! How can I use current value when updating data in Pandas DataFrame?

desert oar
#

@grave frost i am familiar with the universal approximation theorem. what you seem to be ignorant of is the large body of work dedicated to making PRNGs look and act as random as possible, specifically for the purpose of making what you are trying to do effectively impossible

#

it more or less forms the basis of modern cryptography

#

if you dont want to take my word for it, go ask about it on a math forum or computer science forum or cryptography forum. see what they have to say about your project

#

i'm willing to be proven wrong, but not by a misquoted textbook chapter

#

@strong osprey can you clarify what you are trying to do? preferably with sample input data and the desired result

#

@elder creek the result of model.pvalues > 0.5 is a Series containing Boolean values -- you can use that Series to select only the True rows like so:

model.pvalues.loc[model.pvalues > .05]
elder creek
#

Yes

desert oar
#

(mistakes corrected above)

elder creek
#

Wow, awesome, thank you so much

desert oar
#

i'd also encourage you to avoid selecting features blindly by p > 0.05

strong osprey
#

I want to append data to cell. I know I can do :

self.data.loc[self.data['SKU'] == '826945379', 'Images'] = 'aaa'
#

but how can i use the data that already is in the cell

elder creek
#

I've got 136 beta values... just looking to remove the ones with more than .05 pvalue

desert oar
#

why 0.05? why not 0.01? have you adjusted the p-values for multiple comparisons? does it even make sense to compare the coefficient to 0? is the model actually homoskedastic i.e. does it satisfy the statistical assumptions required to do such t-tests?

#

that is an outdated feature selection procedure in my opinion

#

what is the purpose of your model? to make predictions? or to make inferences about underlying relationships?

elder creek
#

Wow, interesting

#

Yeah, making predictions

desert oar
#

in that case, use regularization, ridge or lasso

elder creek
#

Using lasso soon

#

This is more about understanding what is going on conceptually

desert oar
#

banish all thought of stepwise selection

elder creek
#

I get the concept piece

#

It's for a class... I've gotten the answers I need, I just want to call it in a tidy manner

desert oar
#

@strong osprey ```python
sel = self.data['SKU'] == '826945379'
self.data.loc[sel, 'Images'] = foo(self.data.loc[sel, 'Images'])

jaunty scroll
#

How do I access attributes of an XML element if those attributes are subelements?

desert oar
#

its a shame they are still teaching that shit in classes

#

anyway hopefully the code snippet helps

#

@jaunty scroll are you using a specific library to do this?

jaunty scroll
#

@desert oar element tree

desert oar
#

and can you give an example of some XML & the resulting values you want?

#

this isnt really a data science question but it might be relevant to people here. normally in the future i would recommend asking questions like this in a help channel, see #❓|how-to-get-help

jaunty scroll
#

oh ok @desert oar that's good to know I wasn

strong osprey
#

@desert oar thanks, i see

jaunty scroll
#

't really sure where to ask

#

I'm asking it here because the end result of this question is going to be a parser that converts to csv and from there into RedShift using dataframe

desert oar
#

ah, then it probably is relevant here

#

hard to know

jaunty scroll
#

yeah one of those fringe projects imo

#

this parser has been a pain because part of the XML just has standard tags and attributes and then here it does this multi-level thing that breaks the program unless I just treat all the elements as equivalent which is useless

desert oar
#

do you have the actual definition for ns1?

#

i think lxml has better namespace support

#

if you include that as text and not just an image, i can make an example for you @jaunty scroll

jaunty scroll
#

one moment sorry was afk

#
                        <ns1:recordIdentifier>33</ns1:recordIdentifier>
                        <ns1:insuredMemberIdentifier>P12561002023TRS</ns1:insuredMemberIdentifier>
                        <ns1:insuredMemberBirthDate>1989-12-31</ns1:insuredMemberBirthDate>
                        <ns1:insuredMemberGenderCode>M</ns1:insuredMemberGenderCode>
                        <ns1:includedInsuredMemberProfile>
                                <ns1:recordIdentifier>34</ns1:recordIdentifier>
                                <ns1:subscriberIndicator>S</ns1:subscriberIndicator>
                                <ns1:subscriberIdentifier></ns1:subscriberIdentifier>
                                <ns1:insurancePlanIdentifier>93182VA013001402</ns1:insurancePlanIdentifier>
                                <ns1:coverageStartDate>2015-01-01</ns1:coverageStartDate>
                                <ns1:coverageEndDate>2015-12-31</ns1:coverageEndDate>
                                <ns1:enrollmentMaintenanceTypeCode>021028</ns1:enrollmentMaintenanceTypeCode>
                                <ns1:insurancePlanPremiumAmount>450.00</ns1:insurancePlanPremiumAmount>
                                <ns1:rateAreaIdentifier>003</ns1:rateAreaIdentifier>
                        </ns1:includedInsuredMemberProfile>
                </ns1:includedInsuredMember>
                <ns1:includedInsuredMember>
#

do you want like an actual text file?

desert oar
#

uhh this isnt PII right?

jaunty scroll
#

no

#

its example data

desert oar
#

ok 😅

#

and you want 1 insured member per row?

jaunty scroll
#

my parser breaks when it gets down to the bottom of these multi-level stacks like this because I think its looking for a standard attribute like a string or integer

desert oar
#

can you show your current code

jaunty scroll
#
for node in tree.iter(None):
    print('\n')
    for elem in node.iter():
        if not elem.tag==node.tag:
            print("{}: {}".format(elem.tag, elem.text))```
desert oar
#

ok 1 min

#

this is maybe a stupid question but why does the insuredMemberProfile have a separate record identifier from the insuredMember itself

jaunty scroll
#

that is actually a good question that I couldn't even begin to answer other than to say that the fine folks in the federal government choose to organize their files this way and I have no say in the matter

desert oar
#

lol ok

#

the reason i ask is -- you want to flatten all that stuff in there into 1 record?

jaunty scroll
#

yea, ideally I want nested dictionaries that can be read into dataframe

#

I'm very new at this and kind of learning as I go so if there's something I say that makes no sense please correct me or ask for clarification

desert oar
#
tree = ET.parse('InboundMedicalClaimFileExample.xml')

included_insured_members = []
for node in tree.iter('ns1:includedInsuredMember'):
    member_info = {}
    included_insured_members.append(member_info)
    if node.tag == 'ns1:includedInsuredMemberProfile':
        for subnode in node:
            member_info[subnode.tag] = subnode.text
    else:
        member_info[node.tag] = node.text

what about something like this?

jaunty scroll
#

let me play with this for a min but this looks hopefully

#

hopeful*

#

so its giving me a mismatched tag error

#
  File "test.py", line 3, in <module>
    tree = ET.parse('inboundenrollmentfile.xml')
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.8_3.8.1729.0_x64__qbz5n2kfra8p0\lib\xml\etree\ElementTree.py", line 1202, in parse
    tree.parse(source, parser)
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.8_3.8.1729.0_x64__qbz5n2kfra8p0\lib\xml\etree\ElementTree.py", line 595, in parse
    self._root = parser._parse_whole(source)
xml.etree.ElementTree.ParseError: mismatched tag: line 439, column 2
desert oar
#

that sounds like a problem in the file

jaunty scroll
#

ah yes it is, that's the last line in the file

#

I just sent you the middle part because that's what was giving me the error but I think if I update these tags to capture the whole file this could work

#

I could extend this code to work for multiple such tiered data structures right?

desert oar
#

yes of course

#

notably, bool(node) will tell you if the node has children

#

so you can say ```python
if node:
for subnode in node:
...
else:
node.text ...

jaunty scroll
#

cool thanks for the help it is much appreciated

desert oar
#

@jaunty scroll you can flatten nodes recursively/infinitely too if you want

def flatten_node(node):
    result_container = {}
    if node:
        for subnode in node:
            result_container = {**result_container, **flatten_node(subnode)}
    else:
        result_container[node.tag] = node.text
    return result_container

although this will fail on XML where you have 2 nodes with the same tag

#

(among many other limitations)

jaunty scroll
#

what's the purpose of those ** @desert oar

#

aren't those usually for kwargs in function parameters?

desert oar
#

@jaunty scroll it serves the same function here, except to the {} dict constructor

#

!e ```python
x = {'a': 1, 'b': 2}
y = {'b': 102, 'c': 103}
print( {**x, **y} )

arctic wedgeBOT
#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

{'a': 1, 'b': 102, 'c': 103}
jaunty scroll
#

so is constructor that a kind of anonymous function? not sure if that's the right term but it is essentially doing what a function would do here it seems like

#

that constructor*

desert oar
#

its not a function

#

but it uses similar syntax

#

you can think of it as the people who designed the python language being clever and letting you use the same syntax in multiple places to mean similar things

feral trout
#

how do I get the plain text from pytesseract.image_to_data?

wise garden
#

I've got two dfs (both 24 rows) in pandas but have different indices and I can't figure out how to multiply them together. The result is series that has 25 rows so I know something is wrong

#

The different indices is due to the fact I sliced them from different parts of a data set

rustic apex
#

How often do you use Numpy by itself? Vs with Pandas or MatLab?

desert oar
#

@wise garden do you not care about the indices? then just do x.reset_index(drop=True) * y.reset_index(drop=True)

#

if you do care about the indexes youll have to do some more work

wise garden
#

no this is perfect thx

keen root
#

Hi, not sure if this is the right channel, but are all GPU calculations made with tensorflow based on CUDA? That is, if I have a GPU without CUDA support I'm "doomed"?

#

I tried to run a CNN the other day and poor CPU... I could almost hear the screams

desert oar
#

@keen root pretty much, yes

#

CUDA is what allows programs to "talk to" the GPU without writing low level graphics code

#

tensorflow certainly depends on it

#

i specifically bought an nvidia 1060 for this reason 🙂 even though its not good for actual machine learning performance, at least it runs CUDA so its good enough to test code before paying for cloud compute

keen root
#

that's too bad... This old baby goes back to early high school. If only I knew I would be needing cuda someday 😅

#

thank you anyway

serene scaffold
#

@desert oar what did that cost?

desert oar
#

@serene scaffold i got the whole rig for $300 without SSD

#

i was a 4 year old gaming rig

serene scaffold
#

I'm hopefully going to build a pc when I graduate

desert oar
#

some local kid was upgrading

serene scaffold
#

Of course they were

#

I'm totally not jealous of gamer kids with better machines than me.

desert oar
#

it was a steal tbh, i dont play recent games much, or if i do i dont care if they are max settings

serene scaffold
#

Also I finished that assignment. The prof said it was the hardest thing for the whole course. But it seemed like it was just pandas and numpy fundamentals.

#

So idk what the rest of the class will even be.

desert oar
#

which assignment was it again?

#

hopefully that doesnt mean you did it wrong 😉

serene scaffold
#

Mean imputation and hot deck imputation

desert oar
#

that, or, they were expecting people to do it in java

serene scaffold
#

I got the same answers as two of my friends.

#

I mean I guess you could have done it in Java if you wanted to be eight levels deep in for loops.

desert oar
#

yeah idk

#

maybe its an easy course i guess

#

seems like a waste of time if thats the hardest thing in the whole semester

#

no offense to your instructor but, theres no point in paying for school if you arent being pushed past your limits in a controlled and constructive way (imo)

#

otherwise you'd just go read a book at home

#

(there are other benefits to school too, namely the opportunity to meet and talk with other people working on similar problems as you with similar interests who might later form a professional network and also form a support network while in school)

serene scaffold
#

@desert oar I think of formal education as a really expensive way to get your knowledge in a certain area accredited, and that all learning is basically self-learning.

desert oar
#

and this is why people feel like education is a ripoff

#

because if thats all you got out of school, then it was a ripoff

serene scaffold
#

yes

desert oar
#

what i am saying is, it doesnt have to be like that, and shouldnt be like that

serene scaffold
#

the next assignment uses data miner

#

or some platform like that. can't remember the name for sure

merry fern
#

classes and dataframes - would it be smart to create a subclass of dataframes if I wanted to dictate their behavior?

for example, if a dataframe was empty, and I went to print it, I would want it to print "No data."

spring elk
#

Has anyone ever rented server time/processing time, if so what did you need it for and what requirements did you have ?

desert oar
#

@merry fern yes you could subclass dataframe and implement a new __str__ method in the subclass

merry fern
#

@merry fern yes you could subclass dataframe and implement a new __str__ method in the subclass
@desert oar
Would it be simple in the sense that I would just setup init and str but it wouldn't screw up any other functionality built in?

#

Thanks

desert oar
#

you dont even need init @merry fern

#

!e ```python
import pandas as pd

class MyDataFrame(pd.DataFrame):
def str(self):
if self.shape[0] == 0:
return 'No data.'
else:
return super().str()

df = MyDataFrame(columns=list('xyz'))
print(df)

arctic wedgeBOT
#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

No data.
merry fern
#

Coolz

#

Thx

desert oar
#

that said... i dont really recommend this

#

it's not easy to "convert" a regular dataframe to this custom dataframe

#

you have 2 other options: 1) directly override DataFrame.__str__, 2) just write a custom pretty-print function for data frames

#

the overriding __str__ method is worse

#

so i'd recommend just writing a function

velvet thorn
#

classes and dataframes - would it be smart to create a subclass of dataframes if I wanted to dictate their behavior?

for example, if a dataframe was empty, and I went to print it, I would want it to print "No data."
@merry fern but why?

safe tapir
#

Anyone know the approximate performance delta between Numba and Numpy?

When preprocessing data, is it better to try to use the built-in pandas functions, or just write your own @numba.jit functions?

desert oar
#

the former is usually faster in my experience @safe tapir , unless you're chaining a lot of them together

merry fern
#

@merry fern but why?
@velvet thorn so that when I have no results, it doesnt say "empty" it gives a message.

velvet thorn
#

I...don't know if that's worth ti

merry fern
#

yea, exploring the idea. not too familiar with classes but my next step is thinking about class objects i think...

#

also, thanks for your help again, i was able to get my code working! im going to try to learn flask now to render it to a web display

velvet thorn
#

I think this is a case where

#

you should recalibrate your brain to understand that "empty DF" -> "no data"

merry fern
#

its not for me, its for when the system looks to output the data. but i can also just do a conditional output

desert oar
#

or just write a pretty printing function

#

subclassing is intrusive

merry fern
#

^^yes

velvet thorn
#

or just write a pretty printing function
@desert oar then this

wheat pilot
#

im trying to run this pd.DataFrame(xFeat).columns essentially but i get an attribute error 'numpy.ndarray' object has no attribute 'columns' but i dont understand why since im creating it as a dataframe in the same snippet of code

desert oar
#

@wheat pilot can you share more of your code? maybe the error is from a different place in the code

#

and can you share the full error traceback?

wheat pilot
#

although i may have a temporary solution for that error

#

let me know if what ive changed is not going to bite me in the butt later if you have a minute

cerulean ingot
#

how to fetch sheet name of csv?

#

im using pandas.dataframe

#

to read csv

#

can anyone help?

velvet thorn
#

how to fetch sheet name of csv?
@cerulean ingot what do you mean "sheet name"

cerulean ingot
velvet thorn
#

this thing in bottom of excel or csv sheet
@cerulean ingot CSVs don't have those

#

at least, not standard CSVs

#

only Excel files

cerulean ingot
#

and what about excel?

#

in excel how to read this name

velvet thorn
#

did you Google it?

#

I suggest you do, because I found the answer on the first try...

cerulean ingot
#

no 😀 I was doing csv then you said no so directly asked

#

@velvet thorn thanks

velvet thorn
#

yw but I didn't really do anything

cerulean ingot
#

I even appreciate reply 💯

cedar sky
#

Hi I need help Installing tensorflow anyone online

hasty grail
#

What issues are you running into?

cedar sky
#

Helllo @hasty grail I need it for the tensorflow google examination

#

It shows msv or something is not installed

#

Can I share my screen and can you help me

hasty grail
#

just paste the error here

#

!pastebin

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pydis.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

cedar sky
#

ok one sec I will open pycharm

#

Traceback (most recent call last):
File "C:/Users/HariAkash/PycharmProjects/TF_Config/venv/tf_first.py", line 1, in <module>
import tensorflow
File "C:\Users\HariAkash\PycharmProjects\TF_Config\venv\lib\site-packages\tensorflow_init_.py", line 41, in <module>
from tensorflow.python.tools import module_util as module_util
File "C:\Users\HariAkash\PycharmProjects\TF_Config\venv\lib\site-packages\tensorflow\python_init
.py", line 39, in <module>
from tensorflow.python import pywrap_tensorflow as _pywrap_tensorflow
File "C:\Users\HariAkash\PycharmProjects\TF_Config\venv\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 28, in <module>
self_check.preload_check()
File "C:\Users\HariAkash\PycharmProjects\TF_Config\venv\lib\site-packages\tensorflow\python\platform\self_check.py", line 61, in preload_check
% " or ".join(missing))
ImportError: Could not find the DLL(s) 'msvcp140_1.dll'. TensorFlow requires that these DLLs be installed in a directory that is named in your %PATH% environment variable. You may install these DLLs by downloading "Microsoft C++ Redistributable for Visual Studio 2015, 2017 and 2019" for your platform from this URL: https://support.microsoft.com/help/2977003/the-latest-supported-visual-c-downloads

#

This is the error

#

I am just 14 currently so I find it a bit difficult can you guide me

#

Hello @hasty grail are you there

hasty grail
#

Follow the link and the instructions on that page

cedar sky
#

It doesn't seem to help

hasty grail
#

What have you done exactly?

cedar sky
#

I just entered: import tensorflow

hasty grail
#

Have you completed this step?

cedar sky
#

I installed it many times byt it still shows the same error

hasty grail
#

Which file did you download?

cedar sky
#

Shall I share my screen

#

and can you help me

hasty grail
#

Sorry I won't be on PC for long

cedar sky
#

oh ok no problem

#

Which file did you download?
@hasty grail microsoft redistributable 2019

hasty grail
#

Do you have to use any of your own files?

#

I feel that you might be better off simply using a Docker container otherwise

cedar sky
#

I feel that you might be better off simply using a Docker container otherwise
@hasty grail You need to use pycharm for the exam

#

I usually use Google Colab

hasty grail
#

ahh

#

what's your Python version

cedar sky
#

3.7 which is the one supported for the exam

hasty grail
#

can you open your Visual Studio Installer and check whether you can find the C++ Redistributable?

cedar sky
#

let me tryt

#

just a sec

hasty grail
#

haven't used the installer for a while but I think you should be able to select what packages for VS to install on there

#

look under C++

cedar sky
#

ok

#

two mins

#

This is what it shows

hasty grail
#

that's not the installer

#

that is visual studio itself

#

go to your Start Menu and search Visual Studio Installer or something

cedar sky
hasty grail
#

yeah that one

cedar sky
#

This one ???

hasty grail
#

go to the "Installed" tab

cedar sky
#

ok'

#

only visual studio community edition is there

hasty grail
#

click "more"

cedar sky
#

ok

#

then

hasty grail
#

what do you see now

cedar sky
#

wait let me send

#

What to do next

hasty grail
#

modify

cedar sky
#

kk

hasty grail
#

go to "Language packs"

cedar sky
#

ok

#

That just shows languages like 'English','Tamil,'Japanese'

hasty grail
#

hmm ok