#data-science-and-ml | Python | Page 223

woven saffron May 7, 2020, 11:34 PM

#

Maybe its too light?

#

I am normalizing the greyscale to a range between 0 and 1

#

By dividing by 255.0

flat quest May 7, 2020, 11:35 PM

#

well thats the index
are u sure the index correlates with the same class value?

woven saffron May 7, 2020, 11:35 PM

#

I am not sure of that

#

I assumed it was because of the way it was trained

#

Is there a way I can find this out?

#

Can I do like model.labels[idx] or something?

#

Not exactly that

#

But somehow get the label for it

flat quest May 7, 2020, 11:39 PM

#

well it'll follow the datasets label format
oh wait this is sparse, so that shouldnt be an issue

woven saffron May 7, 2020, 11:40 PM

#

Let me try with other numbers

flat quest May 7, 2020, 11:40 PM

#

yeah it might just be one bad one

woven saffron May 7, 2020, 11:41 PM

#

Yeah something is weird

#

I would assume the model wouldn't be this bad

#

It is predicting 3 for everything

#

Lol

flat quest May 7, 2020, 11:42 PM

#

lol

woven saffron May 7, 2020, 11:42 PM

#

PS C:\Users\Ryan\Desktop\ml-api> python .\predict.py
{'confidence': [1.6521667149409522e-18, 2.794477973674936e-12, 0.002019522013142705, 0.9946495890617371, 0.0, 0.002819732530042529, 2.1464751850941433e-11, 0.0005111345089972019, 3.013474395628175e-32, 9.863308PS C:\Users\Ryan\Desktop\ml-api> python .\predict.py
{'confidence': [1.6521667149409522e-18, 2.794477973674936e-12, 0.002019522013142705, 0.9946495890617371, 0.0, 0.002819732530042529, 2.1464751850941433e-11, 0.0005111345089972019, 3.013474395628175e-32, 9.863308066247621e-36], 'prediction': 3}
PS C:\Users\Ryan\Desktop\ml-api> python .\predict.py
{'confidence': [1.6521667149409522e-18, 2.794477973674936e-12, 0.002019522013142705, 0.9946495890617371, 0.0, 0.002819732530042529, 2.1464751850941433e-11, 0.0005111345089972019, 3.013474395628175e-32, 9.863308066247621e-36], 'prediction': 3}
PS C:\Users\Ryan\Desktop\ml-api> python .\predict.py
{'confidence': [2.4253387200669225e-17, 1.805287798591071e-12, 0.08808434754610062, 0.868319034576416, 0.0, 0.042480047792196274, 7.983710914594155e-11, 0.0011165498290210962, 1.889110677748859e-29, 2.5717616017503705e-33], 'prediction': 3}
PS C:\Users\Ryan\Desktop\ml-api> python .\predict.py

#

This is multiple runthroughs

#

All with different digits

#

I had this issue before where it was predicting 5 for everything

flat quest May 7, 2020, 11:53 PM

#

hm

#

mine is working fine lol

#

and its the same code as yours

#

whats the shape of the digit_data?

#

it should be sent as a batch size of one -> np.array([img])

woven saffron May 7, 2020, 11:55 PM

#

I am doing this

#

With my image

#

np.reshape(arr, (1, 28, 28))

flat quest May 7, 2020, 11:56 PM

#

yup thats the problem

#

just tested it
its probably not the same image when u reshape it

#

try plotting it before and after u reshape it

woven saffron May 7, 2020, 11:57 PM

#

When I receive the image I do this to it

#

def read_digit(data: list, encoding: str) -> np.array:
    im = Image.frombytes(encoding, (28, 28), data)
    im.save('im.png')
    arr = np.array(im)
    arr = arr / 255.0
    return np.array([arr])

#

I just changed it

#

Still same

#

I am viewing the image I read

#

And it looks good

#

However I didn't consider what it looks like after I dp arr = arr / 255.0

flat quest May 7, 2020, 11:59 PM

#

hm

#

yeah the reshaping looks fine after i looked over again, i forgot to update the ar values
normalizing shouldnt be an issue

woven saffron May 8, 2020, 12:00 AM

#

Yeah I am very confused

flat quest May 8, 2020, 12:01 AM

#

can u try running one of the images in the test dataset?

#

instead of a custom one

woven saffron May 8, 2020, 12:01 AM

#

Yeah

flat quest May 8, 2020, 12:03 AM

#

those probability output numbers look really similar actually, you might be running the same image into the detector each run

woven saffron May 8, 2020, 12:04 AM

#

Unfortunately that isn't the case

flat quest May 8, 2020, 12:04 AM

#

hmm
did u try running the test image?

woven saffron May 8, 2020, 12:04 AM

#

Yeah 1 sec

#

So it is supposed to be a 7

#

According to y_test

#

[[1.0039120e-06 4.1181980e-08 1.7208897e-05 1.5117071e-04 1.2190830e-10
  5.2620344e-06 7.6247827e-11 9.9981230e-01 7.5138960e-06 5.5338201e-06]]

#

It is a 7 here

#

So the image input is somehow screwing up

flat quest May 8, 2020, 12:09 AM

#

yeah that's what i suspected
since the models running as expected

its likely that ur referencing the same img somehow

can i see ur full code for getting the images and feeding them into the model?

woven saffron May 8, 2020, 12:09 AM

#

import requests
from PIL import Image
import numpy as np
from io import BytesIO
import tensorflow as tf

model = tf.keras.models.load_model('mnist.model')

def read_digit(fname):
    print(f'LOADING {fname}')
    img = Image.open(f'example-digits/{fname}').convert('L').resize((28, 28))
    return BytesIO(img.tobytes())


img = read_digit('4.png')

r = requests.post('http://127.0.0.1:5000/MNIST/predict', files={'file': img})

if r.status_code == 200:
    print(r.json())
else:
    print(f'Error: {r.status_code}')

#

I checked the fname and it changes

#

I can even show the image to make sure

#

📎 im.png

#

This is what gets sent with the 4

#

Aka the current code

#

Is the stroke weight of the images not high enough?

#

Maybe it needs to be thicker?

flat quest May 8, 2020, 12:13 AM

#

hm it could be, tho that doesn't explain why ur getting the same result
i would understand if the model performance is bad, but the probabilites are too similar each run

woven saffron May 8, 2020, 12:14 AM

#

Yeah I ran through a couple and the proper image is being loaded

#

I even check on the API side

#

And it is the same thing I sent

flat quest May 8, 2020, 12:14 AM

#

but the model works for normal images

#

from the test dataset

woven saffron May 8, 2020, 12:14 AM

#

Correct

#

Let me show what an image from the dataset looks like

#

Lets see if its way thicker

#

Wait...

#

test data array looks like

#

[0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.3254902  0.99215686 0.81960784 0.07058824 0.         0.
  0.         0.         0.         0.        ]
 [0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.08627451
  0.91372549 1.         0.3254902  0.         0.         0.
  0.         0.         0.         0.        ]

#

Meaning white is 0

flat quest May 8, 2020, 12:18 AM

#

from the test dataset or from ur images?

woven saffron May 8, 2020, 12:18 AM

#

So rn mine is inverted

#

Test datasetr

#

So anything that is 255 needs to be 0

#

And then the blacks get normalized down to 0 to 1 scale

flat quest May 8, 2020, 12:19 AM

#

? but the number isnt white

woven saffron May 8, 2020, 12:19 AM

#

Well there is lots of 0 which I assume is supposed to be the background

#

This is just 2 rows actually

#

I tried opening this as image

#

And its pure black

#

Even after doing it *= 255

#

To bring it back

#

first_x *= 255.0

img = Image.fromarray(first_x, mode='L')
img.show()

#

Yeah so I made sure

#

0 is background

#

Values > 0 is digit pixels

#

Check this https://paste.pythondiscord.com/yuwuvoguqu.py

#

I didn't mess with print options a lot

#

But you can see the numbers make a 7 lol

#

Which means I need to prep my data differently

#

255 (white) -> 0

#

Anything else gets scaled

#

Yeah @flat quest I am getting different values now but still some are wrong

flat quest May 8, 2020, 12:32 AM

#

are most still correct tho?

woven saffron May 8, 2020, 12:32 AM

#

When I send a 1, I get this

flat quest May 8, 2020, 12:32 AM

#

its the convert ("L") thats making it black

#

try removing that

woven saffron May 8, 2020, 12:33 AM

#

https://paste.pythondiscord.com/usuhocorun.py

#

But it predicts this as a 4

flat quest May 8, 2020, 12:33 AM

#

lol u shouldnt be getting 0's for all values, those probabilities should sum to 1

woven saffron May 8, 2020, 12:33 AM

#

This is the pixel values

#

So the actual pixels that make up the digit get scaled between 0 and 1

#

And background is 0

#

And I need to somehow encode the image to get it in a 28x28 array so it isn't rgba

woven saffron May 8, 2020, 12:56 AM

#

Managed to invert the image so it is what the model expects, but still nada

#

Oh well, will try tomorrow

flat quest May 8, 2020, 12:57 AM

#

yeah lol this is a lot harder than expected to solve

woven saffron May 8, 2020, 12:59 AM

#

📎 unknown.png

#

This is the image I am sending

#

Before it gets normalized by dividing by 255

#

The downscaling is messing with it a lot

#

I might need to multiple all the pixels that aren't hard black up by some factor

#

📎 6_2828.png

#

Sending this through works

#

So it is 100% a downscaling issue @flat quest last time I tag you sorry

#

Just thought you might wanna know

#

I drew the digit in a 28x28 px window so it didnt need to downscale

#

And of course it works

flat quest May 8, 2020, 1:15 AM

#

yeah no all good
tho i still don't get why it doesnt work tbh

woven saffron May 8, 2020, 1:17 AM

#

I think it is too noisy

flat quest May 8, 2020, 1:17 AM

#

i wouldve expected that a model could work on lighter images of numbers but i guess it just needed to be trained on lighter images as well

woven saffron May 8, 2020, 1:18 AM

#

Yeah I think the dataset has a lot of large stroke images

#

So I might just do like

#

If its not black scale the white up to 255

#

Or set a very low threshold

flat quest May 8, 2020, 1:18 AM

#

the stroke and also the color might be an issue
u could add those kinds of images in ur training data

through data augmentation

woven saffron May 8, 2020, 1:18 AM

#

Like anything > 50 gets set to 255

#

What is data augmentation?

flat quest May 8, 2020, 1:19 AM

#

its basically transforming some of ur data so u have a greater variety
u might change the brightness, rotation, shear, randomly change a color channel, etc.

#

it allows the model to learn more types of objects

obtuse skiff May 8, 2020, 3:18 AM

#

I have two images that are binary images and need to find the similarity. I was looking at using Pearsons (np.corrcoef). how do I go about doing this

#

what am I putting in the X and Y

unreal thistle May 8, 2020, 6:00 AM

#

Hey guys

merry ridge May 8, 2020, 7:32 AM

#

Just reshape your images into column vectors and throw them in.

lapis sequoia May 8, 2020, 10:37 AM

#

what you need is the inner dot product

#

look up cosine similarity

#

but what sort of images are these

#

what do you mean by binary images

jolly briar May 8, 2020, 10:45 AM

#

@lapis sequoia is the outer product ever referred to as dot?

#

I thought that was only used for inner product

lapis sequoia May 8, 2020, 10:46 AM

#

you're right

#

inner product is the dot product.. inner dot product is a weird way of saying it

deft jacinth May 8, 2020, 10:47 AM

#

can i install psotgresql + pgadmin so i can have an web ui thing on an ubantu 18.04 vps

lapis sequoia May 8, 2020, 10:51 AM

#

you want #databases

deft jacinth May 8, 2020, 10:51 AM

#

oh i missread im dumb sorry

faint furnace May 8, 2020, 11:15 AM

#

How do I delete rows in which birthYear is \N? dropna() doesnt work

📎 unknown.png

flat bough May 8, 2020, 11:57 AM

#

@faint furnace \N is not a missing value. dropna() removes only missing ones. Just drop

past pewter May 8, 2020, 11:58 AM

#

^ https://stackoverflow.com/a/51659326

Stack Overflow

How to drop rows from pandas data frame that contains a particular ...

I have a very large data frame in python and I want to drop all rows that have a particular string inside a particular column.

For example, I want to drop all rows which have the string "XYZ" as a

faint furnace May 8, 2020, 12:09 PM

#

i was trying something like this but its not even recognising it

📎 unknown.png

#

📎 unknown.png

#

is it because my "birthYear" is object type?

#

📎 unknown.png

flat bough May 8, 2020, 12:17 PM

#

try typing \N. Because when in string \N reads as special symbol just like \n or \t

#

📎 Screen_Shot_2020-05-08_at_15.17.49.png

faint furnace May 8, 2020, 12:18 PM

#

📎 unknown.png

#

i am also trying to replace the "\" but i think my code is wrong

flat bough May 8, 2020, 12:18 PM

#

if you want to use in \ is string you need to type it twice

#

https://www.w3schools.com/python/gloss_python_escape_characters.asp

faint furnace May 8, 2020, 12:19 PM

#

it didnt give error but didnt remove the \

📎 unknown.png

flat bough May 8, 2020, 12:25 PM

#

maybe it change it only in the first occurance?

faint furnace May 8, 2020, 12:26 PM

#

Ohh

📎 unknown.png

#

now this removed everything

#

noice

#

i did this change /N to NaN

📎 unknown.png

#

but it didnt consider it as empty value

📎 unknown.png

paper niche May 8, 2020, 2:04 PM

#

try replacing by np.nan? ‘NaN’ is still a string

jagged plume May 8, 2020, 2:04 PM

#

Hi, sorry to interrupt, can someone help with contour sorting? 😄

#

I built a custom dataset and it's quite accurate, I just need to sort from left to right now...

📎 2020-05-08_15-45-52.mov

#

Any help would be greatly appreciated!

jolly briar May 8, 2020, 2:24 PM

#

@faint furnace try posting a small data example, df.to_json() will enable you to do that

solar phoenix May 8, 2020, 2:38 PM

#

Hi, I dont know if someone can help me. I have a set of letters (they are amino acids), I have 6 of them. I want to get every combination of 6 possible, including just 20 repetitions of one of them. Perhaps the fact that there will be millions of them means that this is basically impossible?

#

note: i want the combinations to be 20 letters long

jolly briar May 8, 2020, 2:38 PM

#

@solar phoenix what's the list?

solar phoenix May 8, 2020, 2:39 PM

#

The letters are I, A, G, L,F, V, M

jolly briar May 8, 2020, 2:39 PM

#

list(itertools.permutations(['i', 'a', 'g', 'l', 'f', 'v', 'm']))

faint furnace May 8, 2020, 2:40 PM

#

Thank you forcousteau helped me with that.

jolly briar May 8, 2020, 2:40 PM

#

@faint furnace sure - i mean in general though, that is a useful thing to do

faint furnace May 8, 2020, 2:40 PM

#

yea i actually am checkign what this line of code does

#

takes a little while to run it

solar phoenix May 8, 2020, 2:41 PM

#

rie this gives me a list that is 7 long

#

i want one that is 20 long

#

so for example, one output would be LLLLLLLLLLLLLLLLLLLL

jolly briar May 8, 2020, 2:41 PM

#

there's one in there that matches that

#

oh hang on no, i don't understand why you would get that from a permutation of those elements

#

every combination of 6 possible, including just 20 repetitions of one of them

i don't follow

solar phoenix May 8, 2020, 2:43 PM

#

i want to create strings of length 20

#

that include absolutely every combination of those 7 elements

#

it is probably going to be too many isn't it

jolly briar May 8, 2020, 2:43 PM

#

it's going to be a lot

solar phoenix May 8, 2020, 2:43 PM

#

ye

spark stag May 8, 2020, 2:44 PM

#

i think its 20**20 combinations?

#

oh wait 7**20

solar phoenix May 8, 2020, 2:45 PM

#

yeah

#

7**20

spark stag May 8, 2020, 2:45 PM

#

somthing like that

solar phoenix May 8, 2020, 2:45 PM

#

is too much

spark stag May 8, 2020, 2:45 PM

#

a LOT

solar phoenix May 8, 2020, 2:45 PM

#

ok

#

i'll re think

#

thanks

spark stag May 8, 2020, 2:45 PM

#

what u need it for? xD

solar phoenix May 8, 2020, 2:46 PM

#

each of the letters represent amino acids

#

and i know the properties i want

jolly briar May 8, 2020, 2:46 PM

#

7^20 ? where's that from?

solar phoenix May 8, 2020, 2:46 PM

#

in length 20

jolly briar May 8, 2020, 2:46 PM

#

there are 7! ways to arrange 7 things, you have 20 spaces... I'm trying to remember combinations etc 🤦‍♂️

silk acorn May 8, 2020, 2:46 PM

#

Is it not 20**7?

solar phoenix May 8, 2020, 2:47 PM

#

so i want to make a list of all of them

spark stag May 8, 2020, 2:47 PM

#

maybe but with binary 10 digits it 2**10 ways, this has 7 states length 20

jolly briar May 8, 2020, 2:47 PM

#

20^7, now where's that from?

silk acorn May 8, 2020, 2:47 PM

#

20 * 20 * 20 etc for each char

solar phoenix May 8, 2020, 2:47 PM

#

yeah Grote, i think you are right

#

1280000000

silk acorn May 8, 2020, 2:47 PM

#

Anyway, you are looking at itertools.combinations_with_replacment

jolly briar May 8, 2020, 2:48 PM

#

Idk, 20^7 sounds off

silk acorn May 8, 2020, 2:48 PM

#

20 options for each of 7 characters.
Wait now that I type it out that is the wrong way round

#

7 ** 20 indeed.

spark stag May 8, 2020, 2:48 PM

#

binary has 2 states so number of states is 2**length so with 7 states i thought it would be 7**length tho

#

yh, but tbh i just guessed the order, there was a 7 and a 20 and the answer is very big

jolly briar May 8, 2020, 2:49 PM

#

Well how many ways are there to arrange the 7 characters?

#

it's going to be 7! right?

#

no exponent there

silk acorn May 8, 2020, 2:49 PM

#

That's without replacemtn though.

#

They said 20 * the same letter was a valid option

solar phoenix May 8, 2020, 2:50 PM

#

yep same letter valid

jolly briar May 8, 2020, 2:50 PM

#

That's without replacemtn though.
ah yeah , shit

spark stag May 8, 2020, 2:50 PM

#

i was thinkning of it as a base 7 number problem with 20 digits, how many ways are there to arrange 5 decimal number 0-99999 = 100000 = 10(number base) ** 5 (length of number)

silk acorn May 8, 2020, 2:50 PM

#

itertools.combinations_with_replacement(['i', 'a', 'g', 'l', 'f', 'v', 'm'], 20)

spark stag May 8, 2020, 2:51 PM

#

is that even a good idea to run? how long do u think that will take xD

silk acorn May 8, 2020, 2:51 PM

#

0 seconds.

spark stag May 8, 2020, 2:51 PM

#

really!?

silk acorn May 8, 2020, 2:51 PM

#

Since it doesn't actually make the strings right away untill you use them.

jolly briar May 8, 2020, 2:51 PM

#

creates a generator

silk acorn May 8, 2020, 2:51 PM

#

It creates a generator.

solar phoenix May 8, 2020, 2:52 PM

#

ah

spark stag May 8, 2020, 2:52 PM

#

oh ok thats a more efficient way to do it

solar phoenix May 8, 2020, 2:52 PM

#

so when i loop through this

#

that is when it will become an issue

spark stag May 8, 2020, 2:52 PM

#

so as long as u dont call list on it your ok

jolly briar May 8, 2020, 2:52 PM

#

In [101]: len(list(itertools.combinations_with_replacement(['a', 'b', 'c', 'd', 'e', 'f', 'g'], 20)))
Out[101]: 230230

silk acorn May 8, 2020, 2:53 PM

#

Python an do that much yeah.

jolly briar May 8, 2020, 2:53 PM

#

this is fine

solar phoenix May 8, 2020, 2:53 PM

#

thanks for this all

jolly briar May 8, 2020, 2:53 PM

#

this is way smaller than some of those numbers though 🤔

#

In [108]: samp = pd.Series([''.join(x) for x in itertools.combinations_with_replacement(['a', 'b', 'c', 'd', 'e', 'f', 'g'], 20)])

In [109]: samp.sample(20)
Out[109]:
113891    aabbbcccceeeeffffffg
131937    aaccdeeffffggggggggg
115873    aabbbdddddeeeeeffggg
82491     aaabbbbcdeeefffffggg
173967    accdeeeeeeffffgggggg
90337     aaabcccccccccddddeee
56377     aaaabbbbbbccccceefff
101120    aabbbbbbbbbbbbbcdeeg
224770    cccddddddeeeeggggggg
216041    bccdddddfffffffggggg
115529    aabbbceeeeeeeeeegggg
131396    aaccddddddeeeeefgggg
95043     aaaccccccccccccddeef
4470      aaaaaaaaaaacccdfgggg
65145     aaaabbceeeeeffffffgg
16792     aaaaaaaaccccccccdeff
36986     aaaaaacccdgggggggggg
216190    bccdddeeeeeeffffffff
100210    aaadddddeeeefffffggg
23192     aaaaaaabcccccddfffff
dtype: object

#

looks alright

eager heath May 8, 2020, 2:56 PM

#

You might want to use a generator expression instead, if pandas is okay with that

jolly briar May 8, 2020, 2:56 PM

#

for what? this is fine

#

In [112]: pd.DataFrame(samp).info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 230230 entries, 0 to 230229
Data columns (total 1 columns):
 #   Column  Non-Null Count   Dtype
---  ------  --------------   -----
 0   0       230230 non-null  object
dtypes: object(1)
memory usage: 1.8+ MB

eager heath May 8, 2020, 2:56 PM

#

You could have a pretty big memory burst here, and it could fail in some circonstancies

#

I mean, the list

jolly briar May 8, 2020, 2:56 PM

#

there's really not much

eager heath May 8, 2020, 2:57 PM

#

The lists have way more overhead compared to a dataframe

jolly briar May 8, 2020, 2:57 PM

#

really, that surprises me

#

not too sure how to check the memory usage for that though

eager heath May 8, 2020, 2:57 PM

#

Well, it doesn't hurt, you just need to change the [] by ()

#

You can use sizeof()

jolly briar May 8, 2020, 2:57 PM

#

In [113]: pd.DataFrame(samp).info(memory_usage='deep')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 230230 entries, 0 to 230229
Data columns (total 1 columns):
 #   Column  Non-Null Count   Dtype
---  ------  --------------   -----
 0   0       230230 non-null  object
dtypes: object(1)
memory usage: 16.9 MB

with deeep

eager heath May 8, 2020, 2:58 PM

#

But it will not count strings, which have even more overhead

jolly briar May 8, 2020, 2:58 PM

#

how to get the memory usage of a list then

#

In [118]: l.__sizeof__()
Out[118]: 1880784

this is inaccurate?

#

In [119]: l = [''.join(x) for x in itertools.combinations_with_replacement(['a', 'b', 'c', 'd', 'e', 'f', 'g'], 20)]

that's l

#

that seems close with the original dataframe response

eager heath May 8, 2020, 2:59 PM

#

You also need to count all the strings inside

jolly briar May 8, 2020, 3:00 PM

#

why would it do them separately? but ok, i'll check

eager heath May 8, 2020, 3:00 PM

#

Here it is just counting the chain of references, which are more or less lightweight

solar phoenix May 8, 2020, 3:00 PM

#

I don't know how it can only be 230230

#

it must be more

jolly briar May 8, 2020, 3:00 PM

#

In [120]: x= ''.join(str(x) for x in l)

In [121]: x.__sizeof__()
Out[121]: 4604649

#

seems pretty small still

silk acorn May 8, 2020, 3:01 PM

#

!e
print(7**20)

arctic wedgeBOT May 8, 2020, 3:01 PM

#

@silk acorn :x: Your eval job has completed with return code 1.

001 |   File "<string>", line 1
002 |     print(7"*20)
003 |                ^
004 | SyntaxError: EOL while scanning string literal

silk acorn May 8, 2020, 3:01 PM

#

!e
print(7**20)

arctic wedgeBOT May 8, 2020, 3:01 PM

#

@silk acorn :white_check_mark: Your eval job has completed with return code 0.

79792266297612001

jolly briar May 8, 2020, 3:01 PM

#

that's a big number

#

In [127]: samp[samp.str.startswith('f')]
Out[127]:
230209    ffffffffffffffffffff
230210    fffffffffffffffffffg
230211    ffffffffffffffffffgg
230212    fffffffffffffffffggg
230213    ffffffffffffffffgggg
230214    fffffffffffffffggggg
230215    ffffffffffffffgggggg
230216    fffffffffffffggggggg
230217    ffffffffffffgggggggg
230218    fffffffffffggggggggg
230219    ffffffffffgggggggggg
230220    fffffffffggggggggggg
230221    ffffffffgggggggggggg
230222    fffffffggggggggggggg
230223    ffffffgggggggggggggg
230224    fffffggggggggggggggg
230225    ffffgggggggggggggggg
230226    fffggggggggggggggggg
230227    ffgggggggggggggggggg
230228    fggggggggggggggggggg

#

neat

spark stag May 8, 2020, 3:04 PM

#

nice to look at too

solar phoenix May 8, 2020, 3:04 PM

#

there should be examples of fgfffffffffffffff

spark stag May 8, 2020, 3:04 PM

#

probably somwhere (way) further down the dataset

jolly briar May 8, 2020, 3:05 PM

#

no these are sorted

#

you can also see the index, and the previous len

#

In [130]: list(itertools.combinations_with_replacement(['a', 'b', 'c'], 4))
Out[130]:
[('a', 'a', 'a', 'a'),
 ('a', 'a', 'a', 'b'),
 ('a', 'a', 'a', 'c'),
 ('a', 'a', 'b', 'b'),
 ('a', 'a', 'b', 'c'),
 ('a', 'a', 'c', 'c'),
 ('a', 'b', 'b', 'b'),
 ('a', 'b', 'b', 'c'),
 ('a', 'b', 'c', 'c'),
 ('a', 'c', 'c', 'c'),
 ('b', 'b', 'b', 'b'),
 ('b', 'b', 'b', 'c'),
 ('b', 'b', 'c', 'c'),
 ('b', 'c', 'c', 'c'),
 ('c', 'c', 'c', 'c')]

solar phoenix May 8, 2020, 3:07 PM

#

dislist=list(itertools.combinations_with_replacement(['a', 'b'],3))

#

[('a', 'a', 'a'), ('a', 'a', 'b'), ('a', 'b', 'b'), ('b', 'b', 'b')]

#

there is no 'b''b''c'

#

for example

jolly briar May 8, 2020, 3:08 PM

#

well you couldn't have c in your example

#

it's not in the list

solar phoenix May 8, 2020, 3:08 PM

#

oh yeah

#

lol

#

sorry

jolly briar May 8, 2020, 3:08 PM

#

there's no babb in the version that i posted above tho

spark stag May 8, 2020, 3:08 PM

#

but yuor example only contains half the samples it should of, was that the whoel output?

jolly briar May 8, 2020, 3:08 PM

#

mine, yes

spark stag May 8, 2020, 3:09 PM

#

should be length 81, its not making every combination

jolly briar May 8, 2020, 3:09 PM

#

it's making what i posted

spark stag May 8, 2020, 3:10 PM

#

its like the original one only made like 230000 but should of had way more

solar phoenix May 8, 2020, 3:10 PM

#

ye agree

jolly briar May 8, 2020, 3:10 PM

#

yeah, it's not done babb in the small example

solar phoenix May 8, 2020, 3:12 PM

#

I think there will be too many examples for this, I will have to think of a new approach

#

thanks all for your help

silk acorn May 8, 2020, 3:16 PM

#

Oh, looks like could combinations is sorted only

#

My bad

jagged plume May 8, 2020, 3:17 PM

#

Anyone can offer a helping hand on contour sorting using Tensorflow and OpenCV, please?

jolly briar May 8, 2020, 3:29 PM

#

@solar phoenix

{('a', 'a', 'a', 'a'),
 ('a', 'a', 'a', 'b'),
 ('a', 'a', 'a', 'c'),
 ('a', 'a', 'b', 'b'),
 ('a', 'a', 'b', 'c'),
 ('a', 'a', 'c', 'b'),
 ('a', 'a', 'c', 'c'),
 ('a', 'b', 'b', 'b'),
 ('a', 'b', 'b', 'c'),
 ('a', 'b', 'c', 'c'),
 ('a', 'c', 'b', 'b'),
 ('a', 'c', 'c', 'b'),
 ('a', 'c', 'c', 'c'),
 ('b', 'a', 'a', 'a'),
 ('b', 'a', 'a', 'c'),
 ('b', 'a', 'c', 'c'),
 ('b', 'b', 'a', 'a'),
 ('b', 'b', 'a', 'c'),
 ('b', 'b', 'b', 'a'),
 ('b', 'b', 'b', 'b'),
 ('b', 'b', 'b', 'c'),
 ('b', 'b', 'c', 'a'),
 ('b', 'b', 'c', 'c'),
 ('b', 'c', 'a', 'a'),
 ('b', 'c', 'c', 'a'),
 ('b', 'c', 'c', 'c'),
 ('c', 'a', 'a', 'a'),
 ('c', 'a', 'a', 'b'),
 ('c', 'a', 'b', 'b'),
 ('c', 'b', 'a', 'a'),
 ('c', 'b', 'b', 'a'),
 ('c', 'b', 'b', 'b'),
 ('c', 'c', 'a', 'a'),
 ('c', 'c', 'a', 'b'),
 ('c', 'c', 'b', 'a'),
 ('c', 'c', 'b', 'b'),
 ('c', 'c', 'c', 'a'),
 ('c', 'c', 'c', 'b'),
 ('c', 'c', 'c', 'c')}

this was it right?

#

although what i've just written doesn't want to scale lol

solar phoenix May 8, 2020, 3:52 PM

#

@jolly briar yeah this is it

#

What did you run

jolly briar May 8, 2020, 3:53 PM

#

@solar phoenix what i wrote didn't really scale

solar phoenix May 8, 2020, 3:53 PM

#

Oh

jolly briar May 8, 2020, 3:53 PM

#

i mean - it might have run with patience, i didn't have patience

#

@solar phoenix doesn't this pattern have a name?

#

i'd have thought it would have been done before somewhere and you could just use their file / data

solar phoenix May 8, 2020, 3:54 PM

#

Yeah I thought that too

#

What exactly did you run to get that?

jolly briar May 8, 2020, 3:55 PM

#

i'll get the code it should be in memory

solar phoenix May 8, 2020, 3:55 PM

#

@jolly briar cool thanks

jolly briar May 8, 2020, 3:55 PM

#

unique_chars = 3
string_len = 4
permutations = itertools.permutations(list(string.ascii_letters[:unique_chars]))

all_combinations = []
for perm in permutations:
    c = list(itertools.combinations_with_replacement(perm, string_len))
    all_combinations.append(c)

all_combo_list = list(itertools.chain.from_iterable(all_combinations))
all_combo_list_unique = set(all_combo_list)

solar phoenix May 8, 2020, 3:56 PM

#

On nice

jolly briar May 8, 2020, 3:56 PM

#

this will generate the above list, you can change params 3,4 at the top there

solar phoenix May 8, 2020, 3:56 PM

#

Yeah I see that

jolly briar May 8, 2020, 3:56 PM

#

but don't just stick 7,20 in as it probably won't run

solar phoenix May 8, 2020, 3:56 PM

#

@jolly briar yeah I might see if it can run it on a supercomputer or something

jolly briar May 8, 2020, 3:57 PM

#

i think 5,20 will run 🤔

#

there's probably plenty of room for optimising the above, if it's salvageable

solar phoenix May 8, 2020, 3:58 PM

#

Think that I could use that and then just run it on a server somewhere. Thanks for this- cool solution

arctic cliff May 8, 2020, 6:57 PM

#

i need some help with xml parsing

wintry mural May 8, 2020, 10:55 PM

#

Did someone try using machine learning algorithms for stock/crypto trading

#

I've got a bit of free time so I want to try out to code something like trading bot as a personal project

#

So if someone worked on something similar to this, I would like to hear your experiences

blazing bridge May 9, 2020, 3:27 AM

#

For anyone new and looking into getting into data Science and Mschine Learning. We have made a Youtube channel related to Data Science and Machine Learning and it would mean a lot if could check it out and if you like it, please subcribe. https://www.youtube.com/channel/UCKaajyjktvduM6mmuBtAOyg

YouTube

Coding Matrix

Welcome to our channel, our names are Hamad Sultan and Shaheed Mohamed Ali. We are two aspiring high school students and programmers who wish to share our kn...

main narwhal May 9, 2020, 7:21 AM

#

Does TensorFlow take our Python code (in map function) and do something like JIT compiling? I try to set breakpoint with VS Code but it is not hit at all.🤔

📎 train-model.py_-_hiragana-recognition_-_Visual_Studio_Code_5_9_2020_2_20_57_PM.png

lapis sequoia May 9, 2020, 8:12 AM

#

Hi! I can get help here in machine learning (time series) ?

lapis sequoia May 9, 2020, 8:45 AM

#

Upload the question, so we can see...

lapis sequoia May 9, 2020, 9:10 AM

#

I need to make a machine learning model on the time series to predict the quality of communication.
In this dataset, I need to predict the “Y” column.

📎 unknown.png

#

I plotted a linear plot of y versus date, as well as ACF and PACF.

📎 unknown.png

#

The Dickey-Fuller criterion is 0.
What model can be built and how to determine the parameters for it?The dataset itself was collected over 14 days and contains ~ 7 million rows. I averaged the value over a period of 1 minute. The dataset currently contains 20,160 rows

crimson umbra May 9, 2020, 9:16 AM

#

can anyone help me with something regarding to converting DOB to age in an excel form?

spark pelican May 9, 2020, 10:16 AM

#

Anyone wnna help me with my data analytics assignment? - its about pandas and stuff

faint furnace May 9, 2020, 12:23 PM

#

📎 unknown.png

#

plt.pie(genrevotes1)

📎 unknown.png

#

i generated this by using the code. but i want the labels as well. how do i do that ?

spark stag May 9, 2020, 12:31 PM

#

@faint furnace you need to have lables in theplt.pie() argument, so if you create a list of labels so in your case ["Drama", "Comedy"...] then add the argument labels=labels inside plt.pie so it looks like plt.pie(genrevotes1, labels=labels), then it should show the categories but just a warning, with that many categories the text may become cramped together at the smaller categories

#

btw you don't need to manually create the list if the data is in a dataframe, you just need to get the row names

faint furnace May 9, 2020, 12:33 PM

#

yes i know that but i want to create the plot directly thorugh the series i have

#

genrevotes1. has 1 column as all the genres which I want as labels

faint furnace May 9, 2020, 1:14 PM

#

solved. i was able to do it by typeing
labels=genrevotes1.index

cunning wadi May 9, 2020, 2:16 PM

#

Hey guys

#

Is there a simple way of changing a grid solving program from monte carlo approach to temporal difference learning

#

I have the full monte carlo approach code and just struggling to convert it

split drift May 9, 2020, 3:56 PM

#

Hey,
is there a way to complete 2d on numpy array to matrix with zeroes?
like [[1,2], [1]] to [[1,2], [1,0]]?

spark stag May 9, 2020, 4:33 PM

#

i don't think so because the 2 original lists are of different length so it will give you an array containing lists (so oyu can't use numpy fetures on them)

#

you can probably do it with loops but I don't think there will be a nice numpy feture lke .reshape unless you initialize the array with uniform dimensions

supple moon May 9, 2020, 7:06 PM

#

Is anyone interested in Stock Market algorithms?

uncut shadow May 9, 2020, 7:11 PM

#

wdym?

supple moon May 9, 2020, 7:14 PM

#

In the development and application of algorithms that trade the markets

silent swan May 9, 2020, 7:35 PM

#

wouldn't recommend it

supple moon May 9, 2020, 7:49 PM

#

why do you say that

calm pewter May 9, 2020, 7:52 PM

#

Is anyone interested in Stock Market algorithms?
@supple moon yep

flat quest May 9, 2020, 7:55 PM

#

its difficult to make an algorithm that will perform well with stock markets, there's so many factors
u'll have to have access to a large amount of quality data

calm pewter May 9, 2020, 7:55 PM

#

its difficult to make an algorithm that will perform well with stock markets, there's so many factors
u'll have to have access to a large amount of quality data
@flat quest And thats why it is fun to think about and play with, right?

supple moon May 9, 2020, 7:57 PM

#

i was looking to collaborate to see if we could build something good

calm pewter May 9, 2020, 7:57 PM

#

too early for me, but I'd like to see some links with concepts here 🙂

flat quest May 9, 2020, 8:01 PM

#

well it would be fun
but might become an unnaturally large project lol (will probably take up a lot of resources) and getting/cleaning data gets more frustating the more you do it

merry ridge May 9, 2020, 10:01 PM

#

That's my main area of work although a fair number of resources have gotten shifted to pandemic modeling instead.

drifting umbra May 9, 2020, 10:40 PM

#

does anyone know how to use tensor processing unit (TPU) in Google Colab or Kaggle?

#

i am trying to use TPU and think i am following the example code exactly

#

but it is only using CPU

#

can share notebook

#

📎 unknown.png

#

https://www.kaggle.com/docs/tpu

Tensor Processing Units (TPUs) Documentation

#

im doing exactly this

#

please anyone know anything about TPU at all and tensorflow help me

#

📎 unknown.png

drifting umbra May 9, 2020, 11:41 PM

#

anyone at all please @ or PM me

flat quest May 10, 2020, 1:38 AM

#

are u using a layer / model compatable with tpu? @drifting umbra

drifting umbra May 10, 2020, 1:55 AM

#

@flat quest i belive so. keras Sequential see this:

# instantiate a distribution strategy
tpu_strategy = tf.distribute.experimental.TPUStrategy(tpu)

# instantiating the model in the strategy scope creates the model on the TPU
with tpu_strategy.scope():
    model = Sequential()
    model.add(LSTM(256, input_shape=(X.shape[1], X.shape[2]), return_sequences=True))
    model.add(Dropout(0.2))
    model.add(LSTM(256))
    model.add(Dropout(0.2))
    model.add(Dense(y.shape[1], activation='softmax'))
    model.compile(loss='categorical_crossentropy', optimizer='adam')


# train model normally - doing this below
# model.fit(training_dataset, epochs=EPOCHS, steps_per_epoch=…)

model.summary()

#

starts as

import numpy
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import LSTM
from keras.callbacks import ModelCheckpoint
from keras.utils import np_utils
import os
import tensorflow as tf
print("Tensorflow version " + tf.__version__)
...

#

i can upload file if that helps

#

notebook?

arctic canopy May 10, 2020, 2:10 AM

#

What's up guys... So I wanna learn ML and robotics ,something like put these 2 things together but I don't know where to start or which one should I start with,can someone pls give me some recommendation for some tutorial or advice .

flat quest May 10, 2020, 2:44 AM

#

LSTM layers I know are compatable with TPU by default. @drifting umbra
are u running this code?

# detect and init the TPU
tpu = tf.distribute.cluster_resolver.TPUClusterResolver()
tf.config.experimental_connect_to_cluster(tpu)
tf.tpu.experimental.initialize_tpu_system(tpu)```

#

@arctic canopy well you should first learn the basics of each one separately. Not too sure for robotics but ML u should start with some sort of course to give you a good overview.

arctic canopy May 10, 2020, 2:52 AM

#

@flat quest So is it better to start with ML first?

#

and is it that complex?,I mean I heared it needs a lot of math and stuff

drifting umbra May 10, 2020, 3:11 AM

#

@flat quest thank you and let me check

#

# detect and init the TPU
tpu = tf.distribute.cluster_resolver.TPUClusterResolver()
tf.config.experimental_connect_to_cluster(tpu)
tf.tpu.experimental.initialize_tpu_system(tpu)

#

that is exactly what i have

#

https://www.kaggle.com/reedda/text-generation-with-lstm-recurrent-neural-network/edit/run/33538356

Text Generation With LSTM Recurrent Neural Network

Explore and run machine learning code with Kaggle Notebooks | Using data from [Private Datasource]

#

can i link you this

flat quest May 10, 2020, 4:10 AM

#

yeah i'll take a look
@arctic canopy

#

it depends on where u want to go with ML

#

basic stuff like simple text generation and classifying images doesn't really need much knowledge with math

#

if you want to make a production ready model, or look into improving existing models through new architectures, then you'll have to learn the math to some extent

#

@drifting umbra don't think i have access to your data

arctic wedgeBOT May 10, 2020, 4:20 AM

#

Hey @drifting umbra!

Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .txt attachments, so here are some tips to help you travel safely:

• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)

• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:

https://paste.pythondiscord.com

drifting umbra May 10, 2020, 4:20 AM

#

https://www.dropbox.com/s/4jxpeoqr2k6hssj/wonderland.txt?dl=0

Dropbox

wonderland.txt

Shared with Dropbox

#

just alice in wonderland txt

flat quest May 10, 2020, 4:31 AM

#

lol idk why but i can see the files listed in the input directory
but the os module cannot find it :/

drifting umbra May 10, 2020, 4:33 AM

#

hm

flat quest May 10, 2020, 4:47 AM

#

yeah its odd cause i can find the file using os.walk

arctic canopy May 10, 2020, 6:41 AM

#

@flat quest so basically as you go more deep it will require more math I think, Thanks mate

south dagger May 10, 2020, 7:24 AM

#

Im stoked man just learned a bit of selenium, its so fun

lapis sequoia May 10, 2020, 7:30 AM

#

what?

south dagger May 10, 2020, 7:58 AM

#

learned how to use selenium with python to web scrape

vivid badge May 10, 2020, 10:57 AM

#

python

timber jolt May 10, 2020, 10:59 AM

#

vscode

cunning wadi May 10, 2020, 12:15 PM

#

hi guys

#

📎 unknown.png

#

How would i go about changing this into a temporal difference approach

solar phoenix May 10, 2020, 1:42 PM

#

Let's imagine I have a dataframe with an arbitrary length, and each row has 4 parameters. I want to rank the dataframe by these 4 parameters. At the moment what I do is test each of the parameters against a desired value and then rank them by how close they are to the value. I then add a new column to the dataframe which indicates which rank they are for that parameter. I do this for all 4 parameters then combine the rank of parameters 1-4 and get a "total" rank, with the closest to my desired set of parameters being at the top. Is there a better way to do this? My concern is that a row that has very good score for 3 parameters will constantly outcompete one that has a mediocre score in all 4 parameters.

jolly briar May 10, 2020, 2:34 PM

#

@solar phoenix might be easier to follow with example data

solar phoenix May 10, 2020, 2:35 PM

#

Hi @jolly briar I'll make an example and send it, cheers

jolly briar May 10, 2020, 2:36 PM

#

df.sample(20,random_state=1).to_json() might be useful

faint umbra May 10, 2020, 2:45 PM

#

Im quite new to kaggle. I have a model now running. Its gonna take 2 hours to make the version. Can I safely close the window?

#

Knowing that my version creation will finish? 😄

twin parcel May 10, 2020, 5:45 PM

#

Hey guys sorry if this is the wrong spot... thoughts on using a .txt or .xlsx to create a wordbank that i will use as an array to compare other parsed text to? im trying to automate my job search abit lemon_warpaint

somber rune May 10, 2020, 6:11 PM

#

urgent help needed

#

on google colab

#

can i share screen

#

??

jolly briar May 10, 2020, 6:16 PM

#

@twin parcel why xlsx over csv?

twin parcel May 10, 2020, 6:17 PM

#

just kinda compare them all forgot csv was another type and the best option atm prob

jolly briar May 10, 2020, 6:17 PM

#

you can append to a csv

#

to_csv( mode = 'a') iirc

lapis sequoia May 10, 2020, 6:17 PM

#

completed numpy, pandas and matplotlib. now what next?

twin parcel May 10, 2020, 6:18 PM

#

also am i able to import csv filled with reg expressions by chance?

somber rune May 10, 2020, 6:18 PM

#

read ISL @lapis sequoia

jolly briar May 10, 2020, 6:18 PM

#

what does "completed" mean @lapis sequoia

flat quest May 10, 2020, 6:18 PM

#

what do you mean completed numpy pandas and matplotlib?

lapis sequoia May 10, 2020, 6:18 PM

#

Studied them

jolly briar May 10, 2020, 6:18 PM

#

@twin parcel can you give an example

#

@lapis sequoia what does studied mean here, to what extent

lapis sequoia May 10, 2020, 6:18 PM

#

what's the next step?

jolly briar May 10, 2020, 6:19 PM

#

explaining yourself properly would be a good step imo

lapis sequoia May 10, 2020, 6:19 PM

#

@lapis sequoia what does studied mean here, to what extent
@jolly briar covered these topics with practical data analysis

somber rune May 10, 2020, 6:19 PM

#

@jolly briar do you have nay idea about auto encoders ?

jolly briar May 10, 2020, 6:19 PM

#

@somber rune sorry no 😦

#

@lapis sequoia you're being vague as hell so I"m just going to say go read ESL

somber rune May 10, 2020, 6:20 PM

#

@twin parcel do you ?

twin parcel May 10, 2020, 6:20 PM

#

Im trying to add "minimum of 6 years" to a list im going to filter out but id rather have a reg expression that would cover minimum of 6< years instead of writing "minimum of 6 years, minimum of 7 years, minimum of 8 years" but i also want to keep it imported from a file

#

Err not a clue besides for image and vid but i've used python for a total of 3 hours now so cant help much 😂

somber rune May 10, 2020, 6:21 PM

#

ah okay

flat quest May 10, 2020, 6:21 PM

#

@lapis sequoia lol that is still so vague

jolly briar May 10, 2020, 6:21 PM

#

@twin parcel tbh if you only have a fixed set of cases just writing them out and putting them in a list isn't really a bad idea

#

@twin parcel idk what the source is though - are they guaranteed to have this structure?

twin parcel May 10, 2020, 6:22 PM

#

only problem is im going to cover 6< and that could reach 20 and yes they will always have this structure

lapis sequoia May 10, 2020, 6:23 PM

#

I want to learn machine learning and data science. I am a self learner. Learned Python with OOP concepts, and topics like file handling, regex, web scraping, numpy, pandas and matplotlib. What's the next 3 things should I learn to get one step further? Should I learn scikit now?

twin parcel May 10, 2020, 6:23 PM

#

or if one falls out of it itleast i catch 7/10 and have that many less jobs to read over

jolly briar May 10, 2020, 6:23 PM

#

why would you store regex's in a csv rather than in the script? I don't really follow that

twin parcel May 10, 2020, 6:23 PM

#

because I want to make it simple for someone to adapt without touching the code to much i would call the txt in a loop to use each regex inside it

jolly briar May 10, 2020, 6:24 PM

#

sounds funky

twin parcel May 10, 2020, 6:24 PM

#

it is 😂

jolly briar May 10, 2020, 6:24 PM

#

I don't think i'd store regexs like that

#

they should be in code, data in data

#

separation and all that jazz

#

without a concrete example idk what to say here, other than have your scripting in the script

flat quest May 10, 2020, 6:25 PM

#

in terms of ML
depends on what u want to do

surface level ML or if u just want to work with existing architectures, yeah scikit and tensorflow are a good place to go

if u want to improve existing models/architectures, you're gonna have to learn some aspect of the math. (online vids, courses, wiki are all good for that).

@lapis sequoia

jolly briar May 10, 2020, 6:26 PM

#

if u want to improve existing models/architectures
this seems like an extremely narrow set of people

twin parcel May 10, 2020, 6:26 PM

#

hmmm thinking on that maybe ill format my txt different code the regex in but get the year value based on the files value

#

might use a txt dedicate first few lines to specific stats then split after thoose lines on ,

flat quest May 10, 2020, 6:27 PM

#

not really. A lot of ML startups/companies have some focus on improving existing architectures. Most of the ML stuff we use now was invented in like the 80's.

jolly briar May 10, 2020, 6:27 PM

#

I think it's still very narrow, of the people that are going to learn these tools, that's a narrow set of people

lapis sequoia May 10, 2020, 6:28 PM

#

Can you guys tell me what 3 skills should I learn now? I'm trying to get into data science career with self learning

flat quest May 10, 2020, 6:28 PM

#

yeah, but if you plan to make a career or job out of it. The math can't be ignored.

jolly briar May 10, 2020, 6:28 PM

#

though it depends what improve means I guess - if it's publishing and stuff, very narrow

#

@lapis sequoia what do you think you should do, based on what you've looked at so far

flat quest May 10, 2020, 6:29 PM

#

performance of models. More data/cleaning/feature engineering does help, but only to a certain extent.

jolly briar May 10, 2020, 6:29 PM

#

yeah i think data engineering is more important for most

lapis sequoia May 10, 2020, 6:30 PM

#

Can you specify the topics ?

jolly briar May 10, 2020, 6:30 PM

#

esp. if someone is self learning

flat quest May 10, 2020, 6:33 PM

#

yeah

#

@lapis sequoia if u don't have any clue of any architectures. Start out with linear reg/logistic on scikit. Then try decision trees and gradient boosting.

lapis sequoia May 10, 2020, 6:37 PM

#

I'm trying to set a roadmap and create a curriculum on my own to continue learning from here.

flat quest May 10, 2020, 6:38 PM

#

well those are the topics u should look into

lapis sequoia May 10, 2020, 6:39 PM

#

Is this a good information? I'm almost following this. https://towardsdatascience.com/a-road-map-for-data-science-d1977504a72b

Medium

A Road Map for Data Science

What is Data Science?

jolly briar May 10, 2020, 6:39 PM

#

trying to plan everything out to the nth degree is the biggest waste of time

flat quest May 10, 2020, 6:39 PM

#

^

jolly briar May 10, 2020, 6:39 PM

#

those suggestions from drag are good, do them

#

Don't bother reading medium posts about planning about planning about planning

#

if you've been through np/pandas/mpl as much as you think trying a project is a good test

#

should be able to get some open data from a gov site, clean it, and provide some insights

#

I would bet it's more time consuming than you expect

flat quest May 10, 2020, 6:41 PM

#

yeah got stuck on that for a while too
u never get out of the phase of planning to do something

u need to do start just working with ML.

#

yeah cleaning data takes longer than model building imo

lapis sequoia May 10, 2020, 6:42 PM

#

I actually downloaded some datasets from Kaggle and presented visual representation well for practice. Is that all what data analyzing mean?

jolly briar May 10, 2020, 6:42 PM

#

how old are you?

#

because i'm not sure why these questions are phrased as they are

#

Is that all what data analyzing mean?
this question just seems nuts

flat quest May 10, 2020, 6:44 PM

#

presenting means nothing. You need to be able to extract information from it @lapis sequoia

lapis sequoia May 10, 2020, 6:44 PM

#

like?

jolly briar May 10, 2020, 6:44 PM

#

What do you think?

#

do you have any ideas / thoughts of your own?

flat quest May 10, 2020, 6:45 PM

#

maybe ur famiiar with this dataset the kaggle titanic.

Let's say u see that there's a cluster of deaths with people who have the same last name

jolly briar May 10, 2020, 6:45 PM

#

because this is probably the first thing I'd learn, you're not going be given a 3 step guide to anything at work

flat quest May 10, 2020, 6:45 PM

#

then you might infer that those people are part of the same family group, and family groups are likely to all die or survive.

lapis sequoia May 10, 2020, 6:47 PM

#

I played with some other dataset as well analyzing different scenarios

flat quest May 10, 2020, 6:49 PM

#

aight idk the only advice i can really give

Is to dive into your data and then work with it. Like rie said make a project utilizing the python data-packages and publish the results on a website or on github.

lapis sequoia May 10, 2020, 6:50 PM

#

I don't really understand what type of projects it can be. Can you generalize it?

jolly briar May 10, 2020, 6:51 PM

#

@lapis sequoia have a guess

#

just have a guess at something that shows you have thought for yourself and go from there, maybe it's a good idea

lapis sequoia May 10, 2020, 6:52 PM

#

should be able to get some open data from a gov site, clean it, and provide some insights
@jolly briar like this?

jolly briar May 10, 2020, 6:53 PM

#

using my thought as your own, sure

#

given what you say you have learnt - my biggest concern would be that you don't seem to be able to piece anything together for yourself

#

which suggests that perhaps you've rattled through a few tutorials without digesting / internalising any of it

twin parcel May 10, 2020, 6:54 PM

#

using my thought as your own, sure
@jolly briar sounds like every coding meeting ive been apart of...

jolly briar May 10, 2020, 6:54 PM

#

😄

#

i never did understand that regex thing you were talking about @twin parcel , personally i'd always go for having them in the script, and having data as it's own thing

#

you can just extract the numbers if that's easier , re.search( r'(\d+)', sentance).group(1) looks like it'd catch what you needed

twin parcel May 10, 2020, 6:58 PM

#

I decided to go that route im gonna research regex in python but i added a place holder in the .txt to fill so others can change it and ill just read the number "Minimum Years: 7" is line 2 and im able to grab the 7 easily so ill use that var in regex and it should work out 🙂

#

soon i will have a tool to not waste time on the job search lemon_enraged

flat quest May 10, 2020, 7:00 PM

#

share me that tool when u make it 😛

jolly briar May 10, 2020, 7:00 PM

#

📎 is_it_worth_the_time.png

twin parcel May 10, 2020, 7:01 PM

#

Figure ive looked once a day for last few months alone switching tabs prob is around 5 mins a day

#

so already i have it just take the first 10 pages of indeed based on my search and print it in one page, next step filter out requirments im nowhere near as a new grad, theres several extra minutes a day determining if i can. then save it all to a file is the last step. figure at 22 i have a good 40 years of career so this tool can be used later on also

#

also another big benefit is i can now add python to my portfolio as i wanted to make a scraper with it for a while but couldnt find a legal use

jolly briar May 10, 2020, 7:05 PM

#

what's indeeds policy on scraping?

#

showing off a tool that shows you haven't read a data usage policy might be a bit brave 😅

twin parcel May 10, 2020, 7:06 PM

#

ah also true 😩

#

imma read up on that

twin parcel May 10, 2020, 7:22 PM

#

If I read its TOS and other google searches correctly its ok as long as its not for commercial uses but i think imma email there customer service before this is center of my git or im IP banned from a job site 😂

thin remnant May 10, 2020, 7:36 PM

#

i'm having a directory notebooks that contains the fastai directory, inside this notebook folder i also have a folder for each week of exercices. In these directories there are notebooks but they can't find the path to the fastai directory because it is complaining about relative paths, could someone help me out ?

#

📎 unknown.png

jolly briar May 10, 2020, 7:45 PM

#

@thin remnant if you run things from project root it makes stuff like this a lot simpler

#

so in your notebook you can have something like os.chdir('../../') , or better something like os.chdir(here()) using here() from pyprojroot

#

here's a link to that : https://github.com/chendaniely/pyprojroot

thin remnant May 10, 2020, 7:47 PM

#

is it not possible to access the fastai modules by using just a path

#

it's only one directory up

#

i think the os.chdir'../' worked

jolly briar May 10, 2020, 7:48 PM

#

it's only one directory up
doesn't change anything, running from proj root is still simpler?
so you can't, from an interactive session in root, do from blah import blah where blah is what you want

#

i think the os.chdir'../' worked
right, but if you re-run it then it'll keep knocking you back (you'll have to reset kernel)

thin remnant May 10, 2020, 7:49 PM

#

would appending fastai to the pythonpath be a solution?

jolly briar May 10, 2020, 7:49 PM

#

better to use here()

#

i've given what i think is a good solution

thin remnant May 10, 2020, 7:49 PM

#

it says here() not defined

jolly briar May 10, 2020, 7:50 PM

#

because you haven't installed the package I linked i guess

thin remnant May 10, 2020, 7:50 PM

#

should i also place that in the same directory as the fastai directory

jolly briar May 10, 2020, 7:50 PM

#

the package? You'd just install with pip

#

do you have this project version controlled?

thin remnant May 10, 2020, 7:51 PM

#

📎 unknown.png

#

like this

#

i didn't do any pip install

jolly briar May 10, 2020, 7:51 PM

#

I've no idea what that is

thin remnant May 10, 2020, 7:51 PM

#

pyprojroot

#

thats what u linked me

#

i cloned that from git

jolly briar May 10, 2020, 7:52 PM

#

why would you clone?

thin remnant May 10, 2020, 7:52 PM

#

how do i install than

jolly briar May 10, 2020, 7:52 PM

#

pip

thin remnant May 10, 2020, 7:52 PM

#

pip install pyprojroot?

jolly briar May 10, 2020, 7:52 PM

#

📎 unknown.png

thin remnant May 10, 2020, 7:53 PM

#

here() is still not defined :/

jolly briar May 10, 2020, 7:53 PM

#

have you followed the readme

thin remnant May 10, 2020, 7:56 PM

#

yes

#

the pyprojroot import here doesnt work

#

no module named pyprojroot

jolly briar May 10, 2020, 7:57 PM

#

are you in jupyter notebook or lab

thin remnant May 10, 2020, 7:57 PM

#

jupyter

jolly briar May 10, 2020, 7:57 PM

#

yeah, which

thin remnant May 10, 2020, 7:57 PM

#

notebook

#

conda

jolly briar May 10, 2020, 7:57 PM

#

you need to restart the kernel

thin remnant May 10, 2020, 7:57 PM

#

i did

#

jolly briar May 10, 2020, 7:57 PM

#

you installed it in the wrong env then

thin remnant May 10, 2020, 7:58 PM

#

do i have to install in the fastai-cpu env ?

jolly briar May 10, 2020, 7:58 PM

#

if that's what you're using for this notebook then yeah

thin remnant May 10, 2020, 7:58 PM

#

i think so

#

i'll try

jolly briar May 10, 2020, 7:58 PM

#

you have to install things for particular envs, you can use requirements files and such to manage this for you

#

make sure you install to that env, should then work

#

also - i tend to use it as os.chdir(here()), just at the top of the notebook

thin remnant May 10, 2020, 7:59 PM

#

import here works now

jolly briar May 10, 2020, 7:59 PM

#

cool

#

so i usually have something like

import os
from pyprojroot import here
os.chdir(here())
<other imports>

thin remnant May 10, 2020, 8:00 PM

#

it still doesnt find fastai

jolly briar May 10, 2020, 8:00 PM

#

which is an odd ordering, i just don't want to use here() throughout the script

thin remnant May 10, 2020, 8:00 PM

#

📎 unknown.png

jolly briar May 10, 2020, 8:00 PM

#

import os
from pyprojroot import here
os.chdir(here())

do this, thenos.listdir(), is it in the root?

thin remnant May 10, 2020, 8:00 PM

#

yes

#

it's in the root

#

should i put the entire path to the project dir ?

jolly briar May 10, 2020, 8:01 PM

#

idk why it wouldn't work when previously os.chdir(.../ stuff did

lapis sequoia May 10, 2020, 8:01 PM

#

hi

thin remnant May 10, 2020, 8:01 PM

#

meh ill just make a path variable

#

not that big of a deal

jolly briar May 10, 2020, 8:01 PM

#

shouldn't have to

thin remnant May 10, 2020, 8:02 PM

#

mmm

#

this is weird

#

📎 unknown.png

#

@jolly briar what should i do when the dir is changed

#

it still doesn't load the import of fastai

jolly briar May 10, 2020, 8:06 PM

#

@thin remnant looks like it's put you into you ~/ dir, which i doubt is the project root

#

is it?

thin remnant May 10, 2020, 8:06 PM

#

nope ..

jolly briar May 10, 2020, 8:07 PM

#

in your project root git init it

#

git init

#

don't do that in your ~/ dir

thin remnant May 10, 2020, 8:07 PM

#

nvm

#

this will do

#

📎 unknown.png

jolly briar May 10, 2020, 8:08 PM

#

🤷‍♂️

heavy night May 10, 2020, 9:31 PM

#

I'm trying to generate plot points for a 3d scatter plot. I have the values, but being new to python, numpy, pandas, etc., I'm not sure if I'm capturing and structuring the data in the most simplified way for plotting. Here is my code:

sample_data_subset_intervals = np.unique(sample_data_subset_df['sampling_interval'].to_numpy())
sample_data_subset_durations = np.unique(sample_data_subset_df['sampling_duration'].to_numpy())

scatterplot_raw_data_df = \
    (sample_data_subset_df[['sampling_interval','sampling_duration','sampling_error']]).dropna()
scatterplot_raw_data_df['sampling_error'] = scatterplot_raw_data_df['sampling_error'].abs()

scatterplot_3d_plot_points_dtype = \
    [('sampling_interval', np.int32), ('sampling_duration', np.int32), ('sampling_error', np.float64)]
scatterplot_3d_plot_points = np.empty([0,1],dtype=scatterplot_3d_plot_points_dtype)
plot_points_index = 0

for interval in sample_data_subset_intervals:
    for duration in sample_data_subset_durations:
        if duration <= interval:
            interval_duration_pair_data_subset_df = \
                scatterplot_raw_data_df[(scatterplot_raw_data_df['sampling_interval']==interval) & \
                                        (scatterplot_raw_data_df['sampling_duration']==duration)]
            idp_sampling_error_summation = interval_duration_pair_data_subset_df['sampling_error'].sum()
            idp_mean_sampling_error = \
                idp_sampling_error_summation / len(interval_duration_pair_data_subset_df.index)
            scatterplot_3d_plot_points.resize(plot_points_index + 1,1)
            scatterplot_3d_plot_points[plot_points_index]=(interval,duration,idp_mean_sampling_error)
            plot_points_index = plot_points_index + 1

#

and the output looks like this:

[[(  10,   10, 0.00000000e+00)]
 [(  30,   10, 4.56183120e-04)]
 [(  30,   30, 0.00000000e+00)]
 [(  60,   10, 2.84578755e-03)]
 [(  60,   30, 1.92741648e-03)]
 [(  60,   60, 0.00000000e+00)]
 [( 120,   10, 1.33025818e-01)]
 [( 120,   30, 1.21143218e-01)]
 [( 120,   60, 9.39393846e-02)]
 [( 120,  120, 0.00000000e+00)]
 [( 300,   10, 7.69409264e-01)]
 [( 300,   30, 7.70362944e-01)]
 [( 300,   60, 7.38203127e-01)]
 [( 300,  120, 5.79511920e-01)]
 [( 300,  300, 0.00000000e+00)]
 [( 600,   10, 1.18857403e+00)]
 [( 600,   30, 1.18091259e+00)]
 [( 600,   60, 1.16379460e+00)]
 [( 600,  120, 1.02220597e+00)]
 [( 600,  300, 6.36643452e-01)]
 [( 600,  600, 0.00000000e+00)]
 [( 900,   10, 1.38186398e+00)]
 [( 900,   30, 1.41657535e+00)]
 [( 900,   60, 1.42654824e+00)]
 [( 900,  120, 1.28564349e+00)]
 [( 900,  300, 9.52358564e-01)]
 [( 900,  600, 4.13780964e-01)]
 [( 900,  900, 0.00000000e+00)]
 [(1800,   10, 1.56350134e+00)]
 [(1800,   30, 1.59038708e+00)]
 [(1800,   60, 1.57760143e+00)]
 [(1800,  120, 1.47674187e+00)]
 [(1800,  300, 1.27458568e+00)]
 [(1800,  600, 9.84249018e-01)]
 [(1800,  900, 7.20700696e-01)]
 [(1800, 1800, 0.00000000e+00)]
 [(3600,   10, 1.58364303e+00)]
 [(3600,   30, 1.62856429e+00)]
 [(3600,   60, 1.66236178e+00)]
 [(3600,  120, 1.67353265e+00)]
 [(3600,  300, 1.47160299e+00)]
 [(3600,  600, 1.39347321e+00)]
 [(3600,  900, 1.18549807e+00)]
 [(3600, 1800, 7.73267790e-01)]
 [(3600, 3600, 0.00000000e+00)]]

The number of brackets and parens in the output implies to me perhaps unnecessary complexity in my data structure, but that may just be due to me being unfamiliar with structuring data in python/numpy. Does the format/structure of this output look correct and most simplified for moving forward with it to plot? Thanks!

surreal flume May 11, 2020, 7:39 AM

#

Hi, I am getting really frustrated, because the changes I am making to a dataframe, inside a function, are not committed outside the function. I use return, but it does not work. Am I missing something ?

#

to be more specific, I wrote a function that takes a column away from the df, and that merges it with another df. The function output is correct i.e. a new df that looks exactly like I want. However, I would like this df to overwrite the original one, and I can't make it work

uncut shadow May 11, 2020, 8:41 AM

#

well, technically it should look like that

data = pd.DataFrame(data={"me": [1, 2], "something": [3, 4]})
data = function(data)

#

and this function would look like

def function(data):
  # do something with this data
  return data # or some other variable if you want

hard fiber May 11, 2020, 9:30 AM

#

is it usual for pandas to sometimes replace (numeric) values?
because i have a dataframe which if i visualize it, there are some 0 in the values which shouldn't be there. i checked the file which i also print it to but all values in it are correct

#

def visualize(dataFrames:dict, outputLocation:str, showing:dict, show = True, save = True) -> None:
    print("Start visualizing...")
    yValues = []
    print("- Start Extracting what to show...")
    for key, val in showing.items():
        if (val):
            print("- - Adding item {}...".format(key))
            yValues.append(key)
    print("- Finished Extracting what to show")

    print("- Start iterating plots...")
    for id, df in dataFrames.items():
        print("- - Starting plot: "+id+"...")
        print("- - - Start converting Duration to numeric values...")
        df.index += 1
        df.DurationIncl = pd.to_numeric(df.DurationIncl)
        #df.ScanTimeAutoLight = pd.to_numeric(df.ScanTimeAutoLight)
        print("- - - Finished converting Duration to numeric values...")
        print("- - - Start plotting...")
        plottedFrame = df.plot(
            y = yValues,
            kind = "line",
            title = "Runtime analysis",
            use_index = True,
            grid = True
        )
        print("- - - Finished plotting")
        print("- - - Start adding legend...")
        legend = []
        for yval in yValues:
            legend.append(yval + " of " + id)
        plottedFrame.legend(legend)
        plottedFrame.set_xlabel("index")
        plottedFrame.set_ylabel("Time in seconds")
        print("- - - Finished adding legend")
        if save:
            print("- - - Start saving...")
            matplotlib.pyplot.savefig(outputLocation+"AnalysedData{}.png".format(id))
            print("- - - Finished saving")
        print("- - finished plot: "+id)
    print("- Finished iterating plots")
    if show:
        print("- Start showing...")
        matplotlib.pyplot.show()
        print("- Finished showing")
    print("Finished visualizing")

the code of the visualiziation

lapis sequoia May 11, 2020, 9:55 AM

#

Hi!

#

How to install fbprophet on win10 for python 3.8?

#

I searched for manuals and tried them - all in vain

polar acorn May 11, 2020, 10:57 AM

#

Never tried with windows, but it worked for me on macOs using conda.

frail ocean May 11, 2020, 11:06 AM

#

In windows10 python3.8 works fine with anaconda.

Hi!
@lapis sequoia in windows10, python3.8 works fine with anaconda

twin parcel May 11, 2020, 12:45 PM

#

while using a scraper, sites that store cookies and login sessions would the scraper use that session, or as a scraper it has its own session?

#

i assume towards its own just like different browsers

hearty holly May 11, 2020, 1:27 PM

#

@twin parcel indeed

twin parcel May 11, 2020, 1:34 PM

#

my old version worked pretty well until I realized some divs had an optional field that i want to track, so now i have to rebuild based on divs.

lapis sequoia May 11, 2020, 2:19 PM

#

https://www.coursera.org/specializations/jhu-data-science is this course good to get started in data science?

Coursera

Data Science | Coursera

Learn Data Science from Johns Hopkins University. Ask the right questions, manipulate data sets, and create visualizations to communicate results. This Specialization covers the concepts and tools you'll need throughout the entire data science ...

#

In windows10 python3.8 works fine with anaconda.
@lapis sequoia in windows10, python3.8 works fine with anaconda
@frail ocean I need in FBProphet lib

frail ocean May 11, 2020, 2:21 PM

#

I see.. then I have no idea. Sorry.

twin parcel May 11, 2020, 2:35 PM

#

Any suggestions on changing this to allow 2 chars minExperienceLimit = badFiltersContent[2][15] Im using it to grab the int from this string, but this wont work for >9 ``` This is the Bad filter list, add words or phrases that make a job more likely not match, seperate with commas!

Minimum Years: 7```

kind saddle May 11, 2020, 4:23 PM

#

i dont know if this fits here but i need to compare 2 numpy arrays with eachother, they have different sizes. if 1 of the colors in first array are found in the big array i need to have a true output

#

i tried allclose but that works for all, i tried isclose, i tried any(isclose)

#

i tried if A in B is true then:

uncut shadow May 11, 2020, 4:27 PM

#

wdym by true output?

kind saddle May 11, 2020, 4:40 PM

#

i dont need to know the value that matched

#

just that they did match

#

a boolean

#

@uncut shadow

tacit spruce May 11, 2020, 4:41 PM

#

What is the best resource for learning regression and classification in Python?

uncut shadow May 11, 2020, 4:42 PM

#

@kind saddle I think this should give you the way to do this (https://stackoverflow.com/questions/25490641/check-how-many-elements-are-equal-in-two-numpy-arrays-python). You can change it to what you actually wanted to achieve

Stack Overflow

check how many elements are equal in two numpy arrays python

I have two numpy arrays with number (Same length), and I want to count how many elements are equal between those two array (equal = same value and position in array)

A = [1, 2, 3, 4]
B = [1, 2, 4,...

kind saddle May 11, 2020, 4:43 PM

#

it works too for different sizes?

uncut shadow May 11, 2020, 4:43 PM

#

well, gimme a sec

kind saddle May 11, 2020, 4:44 PM

#

im alot asking sorry, ive been brainstorming and testing so much that all my ideas ran out :/

uncut shadow May 11, 2020, 4:44 PM

#

well, this one should do
https://stackoverflow.com/questions/45936138/check-how-many-numpy-array-within-a-numpy-array-are-equal-to-other-numpy-arrays

Stack Overflow

Check how many numpy array within a numpy array are equal to other ...

My problem

Suppose I have

a = np.array([ np.array([1,2]), np.array([3,4]), np.array([5,6]), np.array([7,8]), np.array([9,10])])
b = np.array([ np.array([5,6]), np.array([1,2]), np.array([3,192])])

kind saddle May 11, 2020, 4:58 PM

#

@uncut shadow 2 questions but 1 is decently stupid, if it found a match then my if statement should say if value > 0 is true right?
second question is even tho the arrays are the same partly i dont get a match, can i put in an error margin for like +-3?

uncut shadow May 11, 2020, 5:01 PM

#

well, I don't know much about this particular package (I didn't need it before) but you should check it's documentation to check if you can add a margin or stuff like that

kind saddle May 11, 2020, 5:01 PM

#

but to my original arrays before i feed them into the package

#

nvm this is overthinking it too much

uncut shadow May 11, 2020, 5:02 PM

#

actually, what do you mean with the first question cuz I don't think I understood it right

kind saddle May 11, 2020, 5:03 PM

#

the problem is, 2 pictures are translated into arrays, 1 is a small part out of the bigger one. so if i compare i should get a value or anything apart from 0 or whatever since they are the same and treated the same in the code

#

question 1 was, that the count of matches would be greater than 0 if there was a match

#

im sure that is yes xD

uncut shadow May 11, 2020, 5:05 PM

#

so yes, if there is a match then it should be bigger than 0

kind saddle May 11, 2020, 5:07 PM

#

i think the problem is in the processing or asking too many matches. ill try it with 1 single color first

jolly briar May 11, 2020, 7:13 PM

#

@kind saddle do you have example data

kind saddle May 11, 2020, 7:15 PM

#

@kind saddle do you have example data
@jolly briar yes and no, not a raw file its converted from image to array

brisk moth May 11, 2020, 8:02 PM

#

is this where the NLTK nerds are

uncut shadow May 11, 2020, 8:04 PM

#

yes

brisk moth May 11, 2020, 8:06 PM

#

you know how to parse feature based semantics

agile cypress May 11, 2020, 11:20 PM

#

Wdym?

weary ferry May 12, 2020, 1:13 AM

#

define semantics

brisk moth May 12, 2020, 1:47 AM

#

uh

#

i have a CFG with fol and lambda calculus and i give it a sentence and it tokenizes and parses the tree for it with a semantic representation

#

but it does not work with certain constructions, like subject inverted ditransitive questions where the recipient is a prepositional phrase “for x”

trail parcel May 12, 2020, 3:37 AM

#

For beginners 😊 https://youtu.be/38KOhekzEgA

YouTube

ProgrammingHut

Addition using Neural Network | neural networks | [github] | begin...

can neural network add two numbers. In this video i tried something different for practise. Here i crated video for addition of two numbers using the artificial neural networks. whole code you can find in below github link.

code: https://gist.github.com/Pawandeep-...

▶ Play video

flat quest May 12, 2020, 6:27 AM

#

did you really use a 5 layer dense model for addition :/

trail parcel May 12, 2020, 8:02 AM

#

@flat quest it worked better than less number

#

Its not that intensive

flat quest May 12, 2020, 4:15 PM

#

if ur doing addition a single perceptron will work

#

at an equal or higher efficiency
just consider the mathematical basis of the perceptron: x1w1 + x2w2 + b

set b to 0 and w1 and w2 to 1 and you have addition.

rustic igloo May 12, 2020, 4:26 PM

#

Anyone knows where to find the source code for "vocab_file" for (thanks):


FullTokenizer = bert.bert_tokenization.FullTokenizer
bert_layer = hub.KerasLayer("https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/1", trainable=False)

vocab_file = bert_layer.resolved_object.vocab_file.asset_path.numpy() #The vocab file of bert for tokenizer
tokenizer = FullTokenizer(vocab_file)

#

i could only find the source code of the KerasLayer and the resolved_object method, but no vocab_file nor asset_path methods/attributes...

paper niche May 12, 2020, 4:58 PM

#

https://github.com/tensorflow/models/blob/master/official/nlp/bert/export_tfhub.py#L83

GitHub

tensorflow/models

Models and examples built with TensorFlow. Contribute to tensorflow/models development by creating an account on GitHub.

#

it seems vocab_file is an attribute dynamically set in the export_bert_tfhub function

#

bert_layer.resolved_object.vocab_file is an Asset object, https://github.com/tensorflow/tensorflow/blob/v2.2.0/tensorflow/python/training/tracking/tracking.py#L280-L330, where you can see the asset_path property being defined (it returns a tf tensor/string, thus the .numpy() at the end)

GitHub

tensorflow/tensorflow

An Open Source Machine Learning Framework for Everyone - tensorflow/tensorflow

#

@rustic igloo

rustic igloo May 12, 2020, 5:17 PM

#

Thanks @paper niche

orchid tinsel May 12, 2020, 5:19 PM

#

@worn stratus hey man thanks for the reply!, im just wondering do you know where i can find the study materials for decision trees, i need to know about the classification and regression type specifically. i need to make them without sklearn or other 3rd party modules. all i can find in the web were those with them involved.

#

honestly im at a loss lol

worn stratus May 12, 2020, 5:21 PM

#

Hm - I've had a look around for that stuff myself to not great success. If I were you, I'd have a look around at stuff like the Wikipedia page for ID3 decision trees, and random implementations you can find on github. When I looked - they were all a bit rubbish, but still helpful for figuring out roughly what is required

#

From a quick google search, there are tutorials out there for things like building an ID3 decision tree - just they're all not great. With some patience you can probably use a combination of the different resources available to figure things out

#

ID3 is definitely what I'd be looking towards

torpid ingot May 12, 2020, 5:26 PM

#

I have a dataset of images, with each image tagged with keywords. For example, a photo of the Earth from space may have the keywords ["earth", "space", "planet", "photo"] while an illustrated diagram of the Sun may have the keywords ["sun", "star", "illustration", "rendering", "diagram", "labeled"].

I want to be able to automatically filter a set of images with keywords to end up with a smaller set of images that match the aesthetic that I am looking for. To do this, my plan is to figure out the weight of each keyword in detemining if a particular image should or should not be included.

I will have a human go through the dataset of images, and say if they should or should not be included. The classification for each image will be recorded.

I think the output will be something like this where each keyword that was seen is given a weight:

{
    "keyword1": 1.25,
    "keyword2": 0.75,
    "keyword3": -8.6,
    ...
}

Then, when given an unclassified image, my program will use those determined weights to say if or if not that unclassified image should be included.

What types of techniques should I look into for this? One consideration I'm thinking of is that the frequency of a keyword needs to be considered in the weight. If a word only shows up a single time, and I say that associated image does not belong in the set, then that keyword probably shouldn't be 100% bad.

oblique belfry May 12, 2020, 7:40 PM

#

When orchestrating ML into the business, do you all use event sourcing and CQRS concepts? We are doing a lot of stream processing, and we are trying to plan out the best strategy for ML predictions.

sage latch May 12, 2020, 9:19 PM

#

Am i in the right place here to ask Questions about graphs? (nodes and vertices and stuffs)

wide rose May 12, 2020, 9:34 PM

#

#algos-and-data-structs

#

computer science is probably more appropriate

#

tho here is probably fine

sage latch May 12, 2020, 9:37 PM

#

okay, here goes: I'm wring something to represent a game I'm playing. it has a lot of 'currencies' wich can be converted into other 'currencies'. Some of them i can dirrectly assign a value to (in this case a power increase/currency)

#

there are loops in this graph, and I can not guarantee that the dev was smart enough to prevent diverging loops

#

Do you have any hints on how to calculate the 'value' for each currency, when factoring in conversions?

#

I'm thinking about setting all values to 0, except for those wich I can dirrectly assign.

#

all directly assigned currencies go into a queue.

#

then I take one item out of the queue and update all curencies wich can convert into that currency. I will save a 'value' for each currency the currency can convert to. Then I add that currency to the queue

#

that by itself does not terminate.

wide rose May 12, 2020, 9:40 PM

#

so its a BFS?

sage latch May 12, 2020, 9:40 PM

#

whats a BFS?

wide rose May 12, 2020, 9:40 PM

#

breath first search t

#

here is what i would do instead

#

searching seems overcomplicated from whati understand

#

if you are able to recode it

#

just make a class for currency then have a method converting between

#

or a fuction

sage latch May 12, 2020, 9:41 PM

#

the issue are the loops in that directed graph

wide rose May 12, 2020, 9:42 PM

#

do you have a picture of the graph you can upload

sage latch May 12, 2020, 9:43 PM

#

the values should converge when going through a cycle. So if the increase when updating a node is neglectable i won't add it to the queue. (hm... there is the issue of possible multiple small updates?)

#

so it will eventually terminate if all cycles converge

#

I'm unsure how to handle the (propably not happening) case of a diverging cycle

wide rose May 12, 2020, 9:44 PM

#

what do you mean by diverging cycle?

#

like it goes to infinity

#

so it just gets trapped?

sage latch May 12, 2020, 9:45 PM

#

trivial example: i can buy 2 banans for 1 apple. and i can buy 2 apples for 1 banan

#

so if I were to assign some base value to an apple from elsewhere, the value of an apple would diverge by swapping between apples and bananas

#

ofc the graph I'm talking about is a bit more complicated ^.^

wide rose May 12, 2020, 9:47 PM

#

im not sure why the value would diverge

#

do you have some code?

sage latch May 12, 2020, 9:47 PM

#

not yet

#

1 apple= 2 bananas = 4 apples = 8banans =16 apples =32 banans

#

so if my 'base value' for an apple is non-zero it will diverge

wide rose May 12, 2020, 9:49 PM

#

wait that doesnt make sense tho

sage latch May 12, 2020, 9:49 PM

#

more realistic case: 1 apple = 5 dollars; 1apple=2bananas; 1banan=3dollars. whats the dollar value of a banana

wide rose May 12, 2020, 9:49 PM

#

1 apple= 2 bananas = 4 apples

#

that doesnt make sense

#

thats where your error is

#

that cant be true

sage latch May 12, 2020, 9:49 PM

#

the trades are not transitive or reflective

#

it is a directed graph

#

so bannas->apples needs not be 1/(apples->bananas)

#

the diverging case would only happen if the dev of the game seriously messes up. But i can not exclude the possibility

wide rose May 12, 2020, 9:52 PM

#

you could probably just add a fail safe in the code

#

but why are trades not transitive

sage latch May 12, 2020, 9:53 PM

#

maybe transitive was the wrong word

#

but the surely are not reflective

#

yeah they should be transitive, I used the wrong word there sorry

wide rose May 12, 2020, 9:54 PM

#

yea but if its not reflective how does anything have value

sage latch May 12, 2020, 9:56 PM

#

example: I can get 1 banana for 2 apples or 1apple for 2 bananas but i can also sell 1 apple for 1 dollar(my 'value' is meassured in dollars here)

#

wait

#

edited

#

in that case 1 apple is worth 1 dollar and a banan is worth 0.5 dollars

#

thats the simplest case i can imaging for a non-reflective loop with convergent values

#

also the transitions are more complicated.... I think a graph could not be enough to represent it. for example you could get 1 apple 3 kiwi and a grapefruit for 10 bananas. (but only all at once, no single trading) in some cases

wide rose May 12, 2020, 10:02 PM

#

So is a banana worth a dollar?

#

If not then you have arbitrage in the economy

sage latch May 12, 2020, 10:03 PM

#

there is no supply and demand

#

its all 'trades' set by the game dev

#

I think I have an idea how to talke it. I may need to divide by zero a bit, but thats okay ;)

wide rose May 12, 2020, 10:15 PM

#

ok if you have some code or a graph let me know

wise igloo May 12, 2020, 10:28 PM

#

Best intro data science course also what should I Know before getting into an intro to data science course?

sage latch May 12, 2020, 10:32 PM

#

@wide rose is a json-serialized dictionary with dummy data (some data I don't yet have accurate values of, other data depending on my gamestate and I haven't set up those calcuations yet) okay?

wide rose May 12, 2020, 11:09 PM

#

Hey sorry I have to do some studying atm

#

i might be able to help later

#

@sage latch

sage latch May 12, 2020, 11:10 PM

#

It's 1am here, maybe tomorrow?

#

just started making some code to input the data. started with nested loops with global data, now refactoring to reasonable function calls. too lazy to refactor out the global data dict hehe

wide rose May 12, 2020, 11:48 PM

#

hahahha

wise igloo May 13, 2020, 2:18 AM

#

Thanks guys

trail parcel May 13, 2020, 3:37 AM

#

Found a video for creating deep fakes https://youtu.be/RsOJJd1q6Bg

YouTube

ProgrammingHut

Deep fakes in just minutes | [step-by-step] | google colab | links ...

welcome, creating deep fake used to require high computation but stick along with this video as i shown each step to create your own deep fake video.
you can also check below links for more such videos.

CONSIDER SUBSCRIBING

project *
handwritten digit recognition : https...

▶ Play video

lapis sequoia May 13, 2020, 4:50 AM

#

Hi, I have a question regarding data visualization. I have a simple ORM Event model with a single attribute, date_created. (to track volume of API calls over time) How would I go about visualizing this in graphs of different resolutions? For instance, there may be a graph with 15 min resolution that sums up all of the events that occurred within that time span, or an hour resolution that sums up the events within that hour. Isn't there some python library that can make an interactive graph with JS that plugs into the web frameworks?

This is all new to me, any pointing in the right direction is appreciated. Thanks.

flat quest May 13, 2020, 6:08 AM

#

I heard plotly is based on plotly.js so that might be what ur looking for. Though it’s performance isn’t as good as the c based ones.

lapis sequoia May 13, 2020, 6:49 AM

#

Thank you. And 'histogram' was the word I was looking for, a lot more pieces fell into place once I discovered the concept I had in my head had an actual name lol

solar phoenix May 13, 2020, 9:44 AM

#

does anyone have any experience with Numba?

#

or with vectorizing loops with Numpy

valid drum May 13, 2020, 10:21 AM

#

Hi,
I’m trying to implement a CNN with Numpy only and I have a problem that the Convolutional layer is very slow - takes ~1 second...


def run(self, x, is_training=True):
        """Convolves the filters over 'x' """
        if self.filters is None:
            self.filters = self.initialize_weights((self.units, x.shape[0], *self.filter_size))
            self.grads = self._init_bias_weight_like()

        if is_training:
            self.cache['X'] = x

        n_filt, dim_filt, size_filt, _ = self.filters.shape
        dim_img, size_img, _ = x.shape

        if dim_filt != dim_img:
            raise ValueError("Image and filter dimension must be the same")

        size_out = int((size_img - size_filt) / self.stride) + 1

        out = np.zeros((n_filt, size_out, size_out))
        for filt in range(n_filt):
            y_filt = y_out = 0
            while y_filt + size_filt <= size_img:
                x_filt = x_out = 0
                while x_filt + size_filt <= size_img:
                    out[filt, y_out, x_out] = np.sum(
                        self.filters[filt] * (x[:, y_filt: y_filt + size_filt, x_filt:x_filt + size_filt])
                        + self.bias[filt]
                    )
                    x_filt += self.stride
                    x_out += 1
                y_filt += self.stride
                y_out += 1

        out = self.activation.apply(out, is_training)
        return out

Does anybody have an idea how to improve it? Thanks

uncut shadow May 13, 2020, 11:12 AM

#

Are you following any tutorial for that? (I'm asking cuz I'm curious)

valid drum May 13, 2020, 12:22 PM

#

@uncut shadow No

lapis ice May 13, 2020, 12:35 PM

#

Does anyone know a paper or a blog which goes in-dept about the architecture/tehnologies?

eager heath May 13, 2020, 12:43 PM

#

Computer architectures?

wise igloo May 13, 2020, 12:49 PM

#

Thanks guys

lapis ice May 13, 2020, 1:02 PM

#

@eager heath my apology I did not include the extra crucial detail.
I am looking for GAN generator/discriminator data basically

wise igloo May 13, 2020, 1:46 PM

#

I appreciate the advice

lapis sequoia May 13, 2020, 1:46 PM

#

can anyone recommend a good data science book for a beginner ?

steady sparrow May 13, 2020, 2:06 PM

#

can anyone recommend a good data science book for a beginner ?
@lapis sequoia python datascience handbook

lapis sequoia May 13, 2020, 2:06 PM

#

ok thanks

#

hows the book "data science from scratch"?

#

@steady sparrow

steady sparrow May 13, 2020, 2:07 PM

#

📎 unknown.png

lapis sequoia May 13, 2020, 2:08 PM

#

is it good for a beginner ?

#

@steady sparrow

steady sparrow May 13, 2020, 2:09 PM

#

I think yes
Iam using it and it is good with me

#

iam also beginner btw

lapis sequoia May 13, 2020, 2:10 PM

#

ok

#

thanks

steady sparrow May 13, 2020, 2:11 PM

#

@lapis sequoia you are welcome

solar phoenix May 13, 2020, 2:24 PM

#

what is the fastest way to iterate through a function 100s or 1000s of times that gives a string output and add the output to a list?

#

at the moment i just do, for i in range(1000):

#

then append the output to a list

#

speed matters because i will end up doing it several million times

ivory plank May 13, 2020, 2:43 PM

#

@solar phoenix As you probably already know, append has an amortized O(1) cost [which doesn't mean that each individual append costs O(1) time; it just means that because python over-allocates with append, on average, the append operation costs O(1)], so over many many times, that O(1) cost should be very close to actually being O(1)

spark stag May 13, 2020, 2:44 PM

#

@solar phoenix have you tried using list comprehentions (sry for that spelling) because as it doesn't call append it is much faster to create large list item by item

#

iirc it is also faster than list(map(...))

kindred finch May 13, 2020, 2:45 PM

#

It still has to recreate the list every now and then, the same as append although it does have some other optimisations

ivory plank May 13, 2020, 2:45 PM

#

It might be faster to pre-allocate a list if you know the exact number of iteration ahead of time. But, even that is debatable. You can easily test which way is faster in your particular code using a smaller run with python's timeit

#

check out an actual time comparison and do one yourself maybe https://stackoverflow.com/questions/22225666/pre-allocating-a-list-of-none

Stack Overflow

Pre-allocating a list of None

Suppose you want to write a function which yields a list of objects, and you know in advance the length n of such list.

In python the list supports indexed access in O(1), so it is arguably a good...

solar phoenix May 13, 2020, 3:08 PM

#

@spark stag I have not tried that but will now

#

@ivory plank ok did not know about pre allocation

#

thanks all

ivory plank May 13, 2020, 3:18 PM

#

Try this badly written quick program I wrote @solar phoenix , you can appropriately define new ways and give it to the list of functions to time them

#

import timeit


def append_way(n, to_append):
    l = []
    for _ in range(n):
        l.append(to_append)

def pre_allocate(n, to_append):
    l = [""]*n
    for i in range(n):
        l[i] = to_append

def list_comprehension(n, to_append):
    l = [to_append for _ in range(n)]

def deque_way (n, to_append):
    d= deque()
    for _ in range(n):
        d.append(to_append)

def main():
    n = 10**1
    to_append = "test"
    for func in ["append_way", "pre_allocate", "list_comprehension", "deque_way"]:
        seconds = timeit.timeit("{}(n,to_append)".format(func), setup="from __main__ import {};n={};to_append='{}'".format(func, n, to_append), number = 1)
        print("{} takes {} seconds".format(func, seconds))



if __name__ == "__main__":
    main()

solar phoenix May 13, 2020, 3:25 PM

#

@ivory plank awesome will do, thanks so much

ivory plank May 13, 2020, 3:30 PM

#

it's not actually seconds btw, it's usecs. I forgot about the defaults (EDIT: actually, it's seconds. The thing that's actually the problem here is that timeit by default repeats the code 1M times. To make it 1 time, add "number =1" in the timeit call. But, none of this actually changes the difference in timing between the different functions)

solar phoenix May 13, 2020, 4:07 PM

#

Understood. A million might be excessive...

lapis sequoia May 13, 2020, 5:08 PM

#

I have a pandas dataframe with a column 'score' in the range [-1, 1] and I have 10-15 terms in other columns. What would be the best tool to understand how these n-terms predict the score?

polar acorn May 13, 2020, 5:28 PM

#

Finding the correlation between each continuous feature and the score is a good start. Plotting each feature vs the score also gives a good indication.

wise igloo May 13, 2020, 5:46 PM

#

You guys are so helpful! Thank you

lapis sequoia May 13, 2020, 6:22 PM

#

@polar acorn thank you

polar acorn May 13, 2020, 6:44 PM

#

@wise igloo Are you being sarcastic or something?

wise igloo May 13, 2020, 6:45 PM

#

?

#

Besides python what else should I know before going into an intro to data science course?

polar acorn May 13, 2020, 7:20 PM

#

Programmingwise you should probably take a quick look at numpy and pandas. Mathwise you should be familiar with calculus, basic stats and some linear algebra. All of these can be learnt as the same time as you're an doing a intro to data science course and that is perhaps what I would recommend. Just jump into the course, pause and dive into the math or libraries you don't understand.

lapis sequoia May 13, 2020, 7:59 PM

#

@polar acorn from your comment I take it you mean to start with scatter plots then go into linear regression?

lapis sequoia May 13, 2020, 8:16 PM

#

Hello,

I have a problem and I have spent about two hours on it but still unsolved!!

I have a dataset which contains nearly 600,000 data. It is the air pollution of a city. I want to train my machine with 599,999 other data and predict one of them.

Like I drop the data in row 100 and train the machine with 599,999 data and my goal is to predict the dropped row. But I error.

I really appreciate it if you could help me.

#

df = df.head(100000)
df["Measurement date"] = pd.to_datetime(df["Measurement date"])
df["Year"] = df["Measurement date"].apply(lambda x:x.year)
df["Month"] = df["Measurement date"].apply(lambda x:x.month)
df["Day"] = df["Measurement date"].apply(lambda x:x.dayofweek)
df.drop(["Latitude","Longitude","Address","Measurement date"] , axis=1 , inplace=True)
df.drop(100, axis=0, inplace=True)

a=[101,0.004,0.05,0.002,0.9,59,39,2017,1,3]
mine = pd.DataFrame(index=["Station code","SO2","NO2","O3","CO","PM10","PM2.5","Year","Month","Day"] ,
data=a , columns=["Goal"])

y = df["PM10"]
X = df[["Station code","SO2","NO2","O3","CO","PM2.5","Year","Month","Day"]]
from sklearn.model_selection import train_test_split
X_train, y_train, X_test, y_test = train_test_split(X,y,test_size=0.4, random_state=101)
from sklearn.linear_model import LinearRegression
lm = LinearRegression()
lm.fit(X_train, y_train)
predictions = lm.predict(mine)

print(len(predictions))
print("==========================")
print(df["PM10"])

#

This is the code.

#

📎 Screenshot_2020-05-13_at_20.46.34.png

umbral aspen May 13, 2020, 8:31 PM

#

Hi I am working on an multilabel classification problem where I have many possible labels (probably like 30-40 in total)....Generally speaking is creating a somewhat accurate multilabel classification model straight forward when I have so many possible lables? I have about 50k images which I will have tagged and I am looking to train locally with a 1070 gpu

ivory plank May 13, 2020, 9:21 PM

#

@umbral aspen A 30-40 dimensional output isn't uncommon for a conventional problem. The difficulty of the problem depends entirely on your data. The one thing I would look out for is noise in your data and negative samples/negative data [if a particular sample belongs to class 0, does your data ensure that it doesn't also belong to class 1?]. Your GPU is a little underpowered to train state of the art models with large datasets, but also remember that the training difficulty is usually more dependent on the complexity of the model you pick and not the problem itself. I'd personally first do data analysis to find what makes my data easy/difficult to model, and then start with the most simple model that I believe would work for my problem, only moving on to more complex models if my performance isn't sufficient. If your data isn't very complex, even an SVM is pretty good at modeling things.

frail pawn May 13, 2020, 9:32 PM

#

http://nvidia-research-mingyuliu.com/gaugan check that site

nvidia-research-mingyuliu.com

wise igloo May 13, 2020, 9:34 PM

#

@lapis sequoia are you looking at air quality data

#

Nvm I read your previous post

paper niche May 14, 2020, 12:42 AM

#

X_train, X_test, y_train, y_test = train_test_split(...)

@lapis sequoia

umbral aspen May 14, 2020, 4:45 AM

#

@ivory plank I need to first do some manual tagging of the images but then the quality will be good. Also this isn't a problem where I have to classify 1 class per image, as each image could have multiple classes (multi label)... So not sure how much extra complexity this adds to my model...

ivory plank May 14, 2020, 4:46 AM

#

Ah that sounds like a noisy dataset

#

But your problem sounds very similar to ImageNet

#

You might be able to use large parts of ResNet and our current efforts on efficient ImageNets

prisma verge May 14, 2020, 5:18 AM

#

hey, fellow people! i've got pretrained GPT-2 model here that I want to load with gpt2_simple library. how could i do this? two models don't match up, especially since one is tensorflow model and other is pytorch. maybe anyone got gpt2_simple analog for pytorch model?

📎 unknown.png

blazing bridge May 14, 2020, 6:03 AM

#

@lapis sequoia I saw that you were looking for stuff related to Data Science. I would recommend this, https://www.youtube.com/channel/UCKaajyjktvduM6mmuBtAOyg

YouTube

Coding Matrix

Welcome to our channel, our names are Hamad Sultan and Shaheed Mohamed Ali. We are two aspiring high school students and programmers who wish to share our kn...

lapis sequoia May 14, 2020, 6:12 AM

#

@blazing bridge thanks

blazing bridge May 14, 2020, 6:12 AM

#

I would appreciate that if you like it that you subscribe

#

There will be another course on pandas and sci kit learn and matplotlib

lapis sequoia May 14, 2020, 6:13 AM

#

Can u recommend any book for data science?

blazing bridge May 14, 2020, 6:13 AM

#

Probably the python Data science handbook

#

It has good reviews

prisma verge May 14, 2020, 7:47 AM

#

gosh, i'm wrapping my head around it for 2 hours already and still can't get it to work

#

https://github.com/mgrankin/ru_transformers
anyone gets how to make those models up and running?

GitHub

mgrankin/ru_transformers

Contribute to mgrankin/ru_transformers development by creating an account on GitHub.

#

it seems that those models just lack tokenizers and i don't understand how to finetune them without tokenizers

flat quest May 14, 2020, 8:09 AM

#

i haven't looked through all the code
but the github documentation has a tokenizer step (5.5)

silk forge May 14, 2020, 8:59 AM

#

hey guys

#

well i plotted a decision tree using matplotlib but i can't zoom in for some reason

#

📎 unknown.png

#

im talking about this zoom to rectangle thing

#

featureNames = ["Sex", "FamilySize", "Age","Pclass" , "Fare","Embarked"]
classNames = ["Survived",'Succumbed']
fig, ax = plt.subplots(figsize=(10, 10))
plot_tree(clf,feature_names=featureNames,class_names=classNames,filled=True,ax=ax)
plt.show()

#

this is my code

eternal orbit May 14, 2020, 9:03 AM

#

really

spark stag May 14, 2020, 9:18 AM

#

@silk forge the code shouldn't be the issue, are you just clicking the magnifying glass or dragging to create a rectangle for it to zoom into, if that doesn't work either you can try hold right mouse button and drag to zoom

silk forge May 14, 2020, 9:21 AM

#

i created a rectangle

#

but it still wont zoom

#

@spark stag

spark stag May 14, 2020, 9:22 AM

#

if there are no erros idk what the issue could be, did oyu try using right mouse drag to zoom in, it resclaes each axis as you drag, its not the most convenient fix but if it works its better than nothing

crimson umbra May 14, 2020, 10:34 AM

#

Hey can anyone help me with something related to data visualisation

#

I wanna recreate this graph using matplotlib and I need help figuring out the code

eternal orbit May 14, 2020, 10:50 AM

#

What does the code look like

merry violet May 14, 2020, 12:09 PM

#

@crimson umbra I am happy to have a look for you. Send me the code if you can.

compact delta May 14, 2020, 1:29 PM

#

Hello guys im currently struggeling a bit implementing SA to optimize a solution for an assignment. My Solution exists of a list containing numbers from 0 to 11 representing a position in a storage array. Anyway my code executes but does not find any improvement which is definetly false. Does any1 of u see something wrong here? For the neighbour solution our script said pick some random neighbour

📎 unknown.png

paper niche May 14, 2020, 1:37 PM

#

ur last if statement says currentCost < currentCost? o.0

compact delta May 14, 2020, 1:40 PM

#

oh ^^

#

thx, but still no improvement .. kinda strange

paper niche May 14, 2020, 1:53 PM

#

so you’re only doing 1 round of randint for the neighbour before dropping the temperature? when I implemented this for MC a while ago I seem to recall I attempted multiple random “flips” per temperature

compact delta May 14, 2020, 1:54 PM

#

The TA laid some groundwork. And the pseudocode we got was this

📎 unknown.png

#

📎 unknown.png

#

No word about how to chose our neighbour so i just assumed its like this

#

maybe I have to play with the temps a little bit more but its like no improvement at all, so i thought it must be because i made a mistake in the algo somewhere or with choosing the neighbour but dunno

paper niche May 14, 2020, 2:01 PM

#

yeah the neighbour function seems off.. what's the physical context? as in, what does a neighbour mean in this assignment?

#

when I did this, it was in the context of modeling spins in a lattice. so states are up/down of spins in a lattice, and the neighbours are well-defined

compact delta May 14, 2020, 2:05 PM

#

We have a warehouse and some storage shelves, Each iteration a shelve according to our demand list gets called and placed in a queue and then placed back into our warehouse we have to optimize the location where its placed back so that the way these shelves move gets minimized. Each number in the solutionlist is the nth free slot. So 0 is the nearest slot (The warehouse is a 1D array) and 11 the farthest free

paper niche May 14, 2020, 2:07 PM

#

ah it's a 1D array. hmm so wouldn't the neighbour just be +/- 1 of the current index?

compact delta May 14, 2020, 2:07 PM

#

Doubt changing more than 1 of the numbers in our list helps. Thing is to calc the cost you have to simulate the whole process and the only chance I see improving it is with a lower temp and something like 0.9999 as cooling coeff. which will take ages to compute

#

Yeah also thought of it. Will try it out. They didnt specify neighbour in our class so I thought like changing a number is also a neighbour but +-1 will prob see better results

paper niche May 14, 2020, 2:09 PM

#

as in, before the while loop, select a random position, inside the while loop find its neighbour (50% chance of +/- 1), change its state, calculate the Cost, perform the acceptance algorithm

#

the next loop, pick its neighbour, and so on

compact delta May 14, 2020, 2:12 PM

#

If we select a random pos before our while loop it wont go over the other pos after its run through it 1 time its done, am not allowed to change that

compact delta May 14, 2020, 2:32 PM

#

yeah big f should have chosen web-dev that would be an easy a

sick nacelle May 14, 2020, 3:27 PM

#

Hi everyone, i'm currently working with pandas. I got this excel file, when i load it to a pandas df there are cell's values in some columns showing as NaN, but in the excel file these cells have values. Is this because of the value's type in the excel file?

raw rapids May 14, 2020, 6:15 PM

#

thats really weird behavior @sick nacelle

#

do you mind uploading the excel sheet

#

and posting your code

brisk moth May 14, 2020, 6:35 PM

#

does anyone know how FSTs work

sick nacelle May 14, 2020, 7:53 PM

#

@raw rapids I can provide the excel file. The code part it's just loading the excel in a df, though i got the code that generates that excel file.

arctic wedgeBOT May 14, 2020, 8:02 PM

#

Hey @sick nacelle!

It looks like you tried to attach file type(s) that we do not allow (.xlsx). We currently allow the following file types: .3gp, .3g2, .avi, .bmp, .gif, .h264, .jpg, .jpeg, .m4v, .mkv, .mov, .mp4, .mpeg, .mpg, .png, .tiff, .wmv, .svg, .psd, .ai, .aep, .xcf, .mp3, .wav, .ogg.

Feel free to ask in #community-meta if you think this is a mistake.

real wigeon May 14, 2020, 9:25 PM

#

so

#

i have no experience with jupyter notebooks, I watched a 30 minute video.

#

is it worth spending time working with it, or should I do a dashboard with some descriptive stats

raw rapids May 14, 2020, 10:13 PM

#

well if you are planning on doing more data science

#

then you should make. a larger attempt to learn jupyter notebook

#

but a dashboard with descriptive stats is fine too

#

@sick nacelle , I dont know how you can share the excel files. If you find some online sharing for excel you could ping me

#

@brisk moth, FSTs are usually for NLP projects

brisk moth May 14, 2020, 10:21 PM

#

ya i know

#

id like to implement one ive drawn by hand into python

raw rapids May 14, 2020, 10:22 PM

#

o

brisk moth May 14, 2020, 10:22 PM

#

idk how

raw rapids May 14, 2020, 10:22 PM

#

you want to know how to code it in python

brisk moth May 14, 2020, 10:22 PM

#

yea

#

im also not sure if the fst is even right but yolo

#

do you know how to do that

raw rapids May 14, 2020, 10:23 PM

#

I only the concept lol

#

I've never coded one before

brisk moth May 14, 2020, 10:23 PM

#

bummer

raw rapids May 14, 2020, 10:24 PM

#

sorry tho

#

there is an openfst pacakage in python

#

and pywrapfst

#

im more into spacy for nlp

#

@brisk moth

#

open fst seems really easy to use

#

https://pypi.org/project/openfst-python/

PyPI

openfst-python

Stand-alone OpenFST bindings for Python

brisk moth May 14, 2020, 10:44 PM

#

so the states are like the Q0 and Q1 and the arcs are the transitions?

real wigeon May 14, 2020, 11:57 PM

#

well if you are planning on doing more data science
@raw rapids yes, but I am trying to pad my resume asap

flat quest May 15, 2020, 12:20 AM

#

its no use padding ur resume unless you actually know how to work with your tools. Otherwise you'll be lost even if u get a job.
Concrete knowledge with your tools will also allow you to create better projects to pad your resume, so there's no point not learning them.

#

@real wigeon

real wigeon May 15, 2020, 12:21 AM

#

right, but jupyter notebooks is less important than knowing pandas, matplotlib, or numpy

#

?

flat quest May 15, 2020, 12:27 AM

#

well its one of the main ways of sharing data science related work
so if you want to display work you've done (for others to see) in an easy to run notebook, jupyter is usually a good way to go. Also, when ur running models / doing data science ur going to be making visualizations, which is much easier in jupyter usually.

charred blaze May 15, 2020, 12:52 AM

#

they're different tools for different things

real wigeon May 15, 2020, 1:00 AM

#

yeah

charred blaze May 15, 2020, 1:07 AM

#

having that said...

#

I'd say your assessment is right

humble gale May 15, 2020, 3:14 AM

#

x1_domain_list = load_alexa("top-100.csv")
x2_domain_list = load_dga("dga-cryptolocke-50.txt")
x3_domain_list = load_dga("dga-post-tovar-goz-50.txt")

x_domain_list=np.concatenate((x1_domain_list, x2_domain_list,x3_domain_list))


y1=[0]*len(x1_domain_list)
y2=[1]*len(x2_domain_list)
y3=[1]*len(x3_domain_list)

y=np.concatenate((y1, y2,y3))

#print (x_domain_list)

cv = CountVectorizer(ngram_range=(2, 2), decode_error="ignore",
                                      token_pattern=r"\w", min_df=1)
x = cv.fit_transform(x_domain_list).toarray()

# apply KMeans and TSNE ...

k_means = KMeans(init = 'k-means++', n_clusters = 2, random_state=170)
k_means.fit(x)

# assign the labels to a variable
k_means_labels = k_means.labels_
# assign the cluster centres to a variable
k_means_cluster_centers = k_means.cluster_centers_

x_embedd = TSNE(n_components=2, 
                learning_rate=100,
                random_state=170).fit_transform(x)


y_pred = k_means.predict(x)


# fig, ax = plt.subplots(figsize=(7,7), dpi=100)
# plt.scatter(x_embedd[:, 0], x_embedd[:, 1], c=y_pred)

print('Before TSNE: ', x.shape)
print('Accuracy: ', np.mean(y_pred==y)*100)
print('After TSNE: ', x_embedd.shape)  ```

#

please feel free to help, I dont know as to why my acurracy is at 17% , before it was 79%, have I overfit my data or? I am bit lost thanks in advance

paper niche May 15, 2020, 3:38 AM

#

what did u do before?

blazing bridge May 15, 2020, 9:18 AM

#

Would you recommend the https://www.udemy.com/course/machinelearning/

Udemy

Machine Learning A-Z (Python & R in Data Science Course)

Learn to create Machine Learning Algorithms in Python and R from two Data Science experts. Code templates included.

rustic igloo May 15, 2020, 11:21 AM

#

Can anyone suggest a way i can paste code to Colab with Android mobile phone? Trying to be productive on the way to work and have sublime text on my phone. So once i finish a snippet, i thought I could just copy and paste it to Colab. BUT - so far i have been unsuccessful to paste to Colab via my mobile browser. The attached list appears when I hold my finger on the screen (no paste). Thanks!!

📎 unknown.png

#

btw, i've searched online a bit for the solution, but couldn't find anything useful. Wondering if problem is myself.

burnt swan May 15, 2020, 11:32 AM

#

Would you recommend the https://www.udemy.com/course/machinelearning/
@blazing bridge
I guess Udemy courses are good enough.. Go for it... Just take some background of the instructor and the course and u can go for it surely acc to me

Udemy

Machine Learning A-Z (Python & R in Data Science Course)

Learn to create Machine Learning Algorithms in Python and R from two Data Science experts. Code templates included.

ivory plank May 15, 2020, 3:39 PM

#

@rustic igloo Even if the app doesn't allow you to paste, your mobile keyboard should allow you to do so regardless. You should be able to select a text you type and then paste using your phone's default menu or from your phone's keyboard/clipboard

sonic raft May 15, 2020, 5:18 PM

#

Hi guys! I have a very "newbie" question, so I'm learning about ML algorithms, and how to implement them in Python. So my question is about Linear Regression, If I split my data into train and test sets, should I count the accuracy for the train set too, or is it okay if I just count it for the test set?

uncut shadow May 15, 2020, 5:20 PM

#

you should count for both

next smelt May 15, 2020, 5:23 PM

#

Both. Because it will tell you about overfitting /underfitting

sonic raft May 15, 2020, 5:27 PM

#

Thanks! 🙂

#

What do you mean by overfitting / underfitting? (Too low, or too high accuracy score?)

rustic igloo May 15, 2020, 6:01 PM

#

@ivory plank thanks. I am finally able to paste using the phone's Gboard clipboard.

flat quest May 15, 2020, 6:03 PM

#

well a neural model tries to emulate the true relationship between the inputs and outputs right

but since we are not given data on the entire population (only a sample), the pattern in our sample is likely different from the true one.
When we try to fit on the relationship represented by our inputs and outputs in our sample too well (ie pick up patterns that are very subtle) our model will not generalize well to the pattern for the total population since these subtle patterns do not exist in the population.

Underfitting is when we fit on the sample so poorly, that we miss out on major patterns. These major patterns generally also occur in the true relationship, so its important to find those patterns.

Underfitting and overfitting are tradeoffs. The less you try to overfit, the more potential for underfitting.

@sonic raft

sonic raft May 15, 2020, 6:10 PM

#

Thanks, that's really helpful! So Let's say, I underfitted my model(Simple Linear Regression model) , how should I cure this problem? Change train/test scale?(I usually use the 80/20 variation)

#

@flat quest

wild ridge May 15, 2020, 6:23 PM

#

hi everybody
i looking for a labeled text base dataset with less than 50 % accuracy for my uni project
can you help me?

flat quest May 15, 2020, 6:43 PM

#

not necesarrily change train and test sizes but u could use a more complex model.
for example multilayer relu will be able to fit on progressively more complex models such as those with many local max and mins as well as are non-linear

#

@sonic raft

sonic raft May 15, 2020, 6:46 PM

#

Thanks!

charred blaze May 15, 2020, 7:11 PM

#

accuracy at 17%?! that's abysmal

#

something must be wrong with your code itself

stone ruin May 15, 2020, 7:13 PM

#

I imagine there's a bit of bias here, but would the most logical order to learn the 3 languages be Python -> SQL -> R?

charred blaze May 15, 2020, 7:13 PM

#

I would separate SQL from the others

#

you want to know SQL nevertheless

#

you should learn that in parallel to other things

stone ruin May 15, 2020, 7:14 PM

#

Well

charred blaze May 15, 2020, 7:14 PM

#

having that said, what kind of work do you intend to do?

stone ruin May 15, 2020, 7:14 PM

#

Gotta do one at a time

#

pulling csv's out of a GUI built on top of a SQL database

charred blaze May 15, 2020, 7:14 PM

#

I've barely touched R so far in my jobs where I dwelled in machine learning stuff

stone ruin May 15, 2020, 7:14 PM

#

manipulating the data in excel mostly

#

didn't even know VBA

#

but now with all this time, I intend to learn proper data processing / visualization

#

I want to get back into data for banks I guess, gathering isights for what works and what todo next

charred blaze May 15, 2020, 7:15 PM

#

if you want to learn a "real" programming language... go with Python

stone ruin May 15, 2020, 7:15 PM

#

probably some ML to prevent attrition by predicting behaviors that lead to clsoed accounts

charred blaze May 15, 2020, 7:15 PM

#

R's basically a language created by statisticians and it shows

stone ruin May 15, 2020, 7:15 PM

#

So ultimately tracking customer transaction histories (massive DB of line by line per account)

#

to see what actions trend towards a closed account

#

such as if they stop using adebit card

charred blaze May 15, 2020, 7:16 PM

#

having that said, if you intend to work with time series analysis in general, then I would for sure recommend R

stone ruin May 15, 2020, 7:16 PM

#

or their DD disappears

#

trigger them for a contact from a banker or something

#

but there's 10k - millions of transactions a day depending on the bank size

#

but working with big databases like that to gather insights for customer behaviors to ensure maximum profitability

#

I have 0 coding experience 😦

#

outside of the bit of SQL I had to try to work with durign a Salesforce integration

#

although when I caught an error that the expert made I felt pretty good, ahha

#

alright, so I'll go back to learning Python, thanks again! Datacamp is offering a free week, so I wanted to maximize my time with the platform

charred blaze May 15, 2020, 7:19 PM

#

yeah, go with Python + SQL for now.

stone ruin May 15, 2020, 7:20 PM

#

I might go round robin between the two

#

do their intro course to python, then SQL, then do the next python, then enxt SQL

#

❤️ this server, you all are the best

charred blaze May 15, 2020, 7:23 PM

#

yeah, that seems like a good approach

wind plume May 15, 2020, 7:23 PM

#

Hi guys!! I have an issue with pandas that I'm surely it's bc I am new. I don't know if you guys want me to post my code or just a screenshot, but all I can do is a ss for now.

Basically I'm entering in all information from previous dataframes, using user inputs of previous columns!

charred blaze May 15, 2020, 7:23 PM

#

share the code if possible

#

and state specifically what's the problem you're having

wind plume May 15, 2020, 7:23 PM

#

📎 JPEG_20200515_152333.jpg

#

I can in a little bit. Those NaNs shouldn't be there

#

Under dry sample. If I do dry sample first I get nans, if I do weathered sample first I get nans. I tried doing fill na in that door loop but it never worked

#

I think it's something that the two dataframes indices don't match?? But it's hard to know what question to ask and thus what to search for.

wild ridge May 15, 2020, 7:26 PM

#

hi everybody
i looking for a labeled text base dataset with less than 50 % accuracy ever achived for my uni project
can you help me?

charred blaze May 15, 2020, 7:32 PM

#

that code looks a bit weird

#

you intend to append rows to a dataframe

#

but I think you're always replacing entire columns each time you do those assigments

#

nvm, I can see that you really intend to operate column wise

wind plume May 15, 2020, 7:40 PM

#

I don't know the best way to go about this. Just kinda winging it. I know nothing of coding this is like 1 month of quarantine in fruition

charred blaze May 15, 2020, 7:40 PM

#

but it seems off to me, are you sure that what's you really want to do?

#

what do you want to do then

wind plume May 15, 2020, 7:40 PM

#

What do you mean? Code wise

charred blaze May 15, 2020, 7:40 PM

#

yes, code wise it looks weird

wind plume May 15, 2020, 7:40 PM

#

Inefficient??

charred blaze May 15, 2020, 7:41 PM

#

no, just plain wrong

wind plume May 15, 2020, 7:41 PM

#

Or just like logically doesn't make sense

#

Oh shit ok. Let me get my code I'll be back on in a bit.

wind plume May 15, 2020, 8:26 PM

#

https://pastebin.com/330D1i3c

#

Attached is the pastebin. At the top has an explanation of the code, a tldr of what it's meant to do, and what issue I am having.

I want to note I'm very new to coding and this is my first project. I want to use this for myself as I am a material scientist by trade and WFH I spent all this time learning python.

Every problem I've had I addressed by googling questions, but this I haven't been able to see a lot of people have this issue.

#

If you guys can look at this please let me know if there's a fix I can use. If you do look at it and reply, I'd highly appreciate it if you can tag me

radiant nymph May 15, 2020, 8:41 PM

#

Whats the best way to encode text data for building tree models so that we dont get dim curse. I have used target encoding , any alternatives?

flat quest May 15, 2020, 8:41 PM

#

One hot is better for tree based

radiant nymph May 15, 2020, 8:44 PM

#

One creates large vectors > large dims > Complex trees

flat quest May 15, 2020, 9:12 PM

#

U can always limit tree size
But my bad I meant better for non tree based

Target encoding creates relationships that aren’t there like red is 1 doesn’t mean 2 is blue.

But one hot can make the tree split on color red or blue rather than on the

For trees prob just use target encoding for now, since one hot causes those feature to lose importance in the model even when they shouldn’t