arctic wedgeBOT Sep 10, 2021, 1:41 PM

#

@hasty grail :white_check_mark: Your eval job has completed with return code 0.

001 |          35 function calls (32 primitive calls) in 0.097 seconds
002 | 
003 |    Ordered by: standard name
004 | 
005 |    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
006 |         1    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(concatenate)
007 |         1    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(diff)
008 |         1    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(nonzero)
009 |         1    0.000    0.000    0.097    0.097 <__array_function__ internals>:2(unique)
010 |         3    0.000    0.000    0.000    0.000 _asarray.py:110(asanyarray)
011 |         1    0.000    0.000    0.000    0.000 arraysetops.py:125(_unpack_tuple)
... (truncated - too many lines)

Full output: https://paste.pythondiscord.com/fiducopemu.txt?noredirect

hasty grail Sep 10, 2021, 1:41 PM

#

!e

import cProfile
from collections import Counter
import numpy as np

arr = np.random.randint(0, 100, size=1000000)
with cProfile.Profile() as pr:
     c = Counter(arr)

pr.print_stats()

arctic wedgeBOT Sep 10, 2021, 1:41 PM

#

@hasty grail :white_check_mark: Your eval job has completed with return code 0.

001 |          35 function calls (19 primitive calls) in 0.274 seconds
002 | 
003 |    Ordered by: standard name
004 | 
005 |    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
006 |         1    0.000    0.000    0.274    0.274 __init__.py:581(__init__)
007 |         1    0.000    0.000    0.274    0.274 __init__.py:649(update)
008 |         9    0.000    0.000    0.000    0.000 _collections_abc.py:409(__subclasshook__)
009 |         1    0.000    0.000    0.000    0.000 abc.py:117(__instancecheck__)
010 |       9/1    0.000    0.000    0.000    0.000 abc.py:121(__subclasscheck__)
011 |         1    0.000    0.000    0.000    0.000 cProfile.py:117(__exit__)
... (truncated - too many lines)

Full output: https://paste.pythondiscord.com/liberofige.txt?noredirect

hasty grail Sep 10, 2021, 1:42 PM

#

Around 3x faster in this example

#

@gaunt marsh

serene scaffold Sep 10, 2021, 2:09 PM

#

result = 0
for a, b in itertools.product(A, B):
    result += 1 / abs(a - b)

If we assume that A and B are arrays, is there a way to vectorise this?

hasty grail Sep 10, 2021, 2:10 PM

#

when I see itertools.product I think of np.meshgrid

#

so maybe something like...

#

!e

import numpy as np

A = np.arange(4, 8)
B = np.arange(0, 4)

x, y = np.meshgrid(A, B)
print(x - y)
print((1 / (x - y)))

result = (1 / (x - y)).sum()
print(result)

arctic wedgeBOT Sep 10, 2021, 2:13 PM

#

@hasty grail :white_check_mark: Your eval job has completed with return code 0.

001 | [[4 5 6 7]
002 |  [3 4 5 6]
003 |  [2 3 4 5]
004 |  [1 2 3 4]]
005 | [[0.25       0.2        0.16666667 0.14285714]
006 |  [0.33333333 0.25       0.2        0.16666667]
007 |  [0.5        0.33333333 0.25       0.2       ]
008 |  [1.         0.5        0.33333333 0.25      ]]
009 | 5.076190476190476

hasty grail Sep 10, 2021, 2:14 PM

#

oops forgot the abs

#

but you get the idea

desert oar Sep 10, 2021, 2:15 PM

#

a "normal array" would be a list, and yes you can represent rgb_ids and rgb_counts as numpy arrays, use np.array(list(...))

serene scaffold Sep 10, 2021, 2:18 PM

#

@hasty grail thanks lemon_hyperpleased

agile jolt Sep 10, 2021, 2:23 PM

#

anyone?

desert oar Sep 10, 2021, 2:29 PM

#

did you get some negative feedback on it? there are 2 main problems with "radial" charts like this:

perceptual non-linearity due to the geometry of circles
lots of wasted space in the middle; the relevant data is squeezed to the outer edges

the value of a radial chart is when the data covers multiple cycles, e.g. a time series that covers 10 years, you can see all 10 years overlaid effectively, without the visual artifical "break" between the end of the cycle and start of the next cycle

i'd argue that in this particular case, the cyclical nature isn't relevant because this isn't time series data, and the cycles are so short and easy to understand that you don't really get value out of it anyway

#

so my recommendation would be to turn this into a line plot with 2 lines, actual and expected

#

and adjust the y axis

agile jolt Sep 10, 2021, 2:30 PM

#

okay, but i should keep polar chart as an example..so any idea which data could be useful for it?

desert oar Sep 10, 2021, 2:32 PM

#

time series that covers multiple cycles

#

so 2 months of data that exhibits a weekly cycle

agile jolt Sep 10, 2021, 2:35 PM

#

hmm, okay thank you

#

very useful info

lapis sequoia Sep 10, 2021, 2:44 PM

#

best video in hindi for data science?

light warren Sep 10, 2021, 3:56 PM

#

hey, can someone help me calculating returns, i have imported a csv file and i cant calculate return. my data looks this when imported. https://gyazo.com/6838e7287580e232f7783e27d7d4a378

Gyazo

#

everytime i use the code return_portfolio = z.pct_change()

#

i get typeError: unsupported operand type(s) for /: 'str' and 'str'

crisp wing Sep 10, 2021, 5:12 PM

#

Anyone knows why scikits learning_curve would throw this error?

ConnectionRefusedError: [WinError 10061] No connection could be made because the target machine actively refused it

Using code from
https://scikit-learn.org/stable/auto_examples/model_selection/plot_learning_curve.html
Specifically the problem arises within

    train_sizes, train_scores, test_scores, fit_times, _ = \
        learning_curve(estimator, X, y, cv=cv, n_jobs=n_jobs,
                       train_sizes=train_sizes,
                       return_times=True)

#

Seems to be network-connected, but I wouldn't know why that would be the case here

lapis sequoia Sep 10, 2021, 5:32 PM

#

light warren i get typeError: unsupported operand type(s) for /: 'str' and 'str'

You have to convert your data into integer or float

hidden eagle Sep 10, 2021, 5:34 PM

#

Hey guys, I'm a little lost on how to create a vector from an image:

app.route('/mix_images', methods=['POST'])
def mix_images():
 try:
 img = open("test1.jpeg")
 img.load()
 img2 = open("test2.jpeg")
 img2.load()
 # label1 = np.asarray(json.loads(request.form['label1']), dtype='float64')
 # label2 = np.asarray(json.loads(request.form['label2']), dtype='float64')
 label1 = np.asarray(img, dtype='float64')
 label2 = np.asarray(img2, dtype='float64')
 vector1 = np.asarray(json.loads(request.form['vector1']), dtype='float64')
 vector2 = np.asarray(json.loads(request.form['vector2']), dtype='float64')


new_vectors, new_labels = interpolate(12, vector1, vector2, label1, label2)
 new_ims = sample(new_vectors, new_labels)
 
 result = PIL.Image.fromarray(new_ims)
 result.save("test.png")
 return jsonify([
            [ encode_img(arr) for arr in new_ims ],
 new_vectors.tolist(),
 new_labels.tolist()
        ])
 except Exception as e:
 print(e)
 return '', 500

Can someone point me in the direction to create the labels and vectors from a image?

light warren Sep 10, 2021, 5:35 PM

#

lapis sequoia You have to convert your data into integer or float

how do i do that?

hidden eagle Sep 10, 2021, 5:40 PM

#

light warren how do i do that?

Cast

light warren Sep 10, 2021, 5:42 PM

#

my dataframe is called z, so do i do z.pandas.to_numeric()

storm hill Sep 10, 2021, 6:46 PM

#

hidden eagle Hey guys, I'm a little lost on how to create a vector from an image: ```py app....

what do you want to do afterwards ? u need it for neural net ?

lapis sequoia Sep 10, 2021, 6:57 PM

#

light warren my dataframe is called z, so do i do z.pandas.to_numeric()

https://stackoverflow.com/questions/48094854/python-convert-object-to-float

Stack Overflow

Python convert object to float

I read some weather data from a csv file as a dataframe named "weather". The problem is that one of the columns' data type is an object. this is weird beacuse it indicates temperature... anyway, ho...

digital sparrow Sep 10, 2021, 9:25 PM

#

https://paste.pythondiscord.com/tudebapofe.php

#

Hey guys, I'm having an issue with this pandas script at work.

Somewhere it is screwing up the values in a few select columns.

arctic wedgeBOT Sep 10, 2021, 9:28 PM

#

Hey @digital sparrow!

Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .csv attachments, so here are some tips to help you travel safely:

• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)

• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:

https://paste.pythondiscord.com

safe viper Sep 10, 2021, 11:56 PM

#

I am making a data science project and would like to use data to see if masks have been impactful in limiting covid transmission in the US. How would I tackle this problem practically?

#

It seems simple, but now i can't figure out how i would actually try to get data that would represent no mask and data that would represent mask

#

Would i try to compare states who relieved mask mandates early

desert oar Sep 11, 2021, 12:18 AM

#

You could do that, use something like a discontinuity design

lapis sequoia Sep 11, 2021, 12:18 AM

#

Hii, im new to python discord chanel. I have facing issue to extract stocks data from bloomberg api. I installed blpapi. It works. But my goal is to fetch all stocks data from bloomberg api to database like mysql, postgresl. I cant work with excel because of shortage table row, column. If you have a ideas n solution. Please let me know thanx

desert oar Sep 11, 2021, 12:18 AM

#

But keep in mind that there's probably a relationship between the state relieving the mask mandate and people's behavior

#

This is called "endogeneity" in econometrics literature

#

You could do something like compare results on 2 different sides of a state/county/town line

#

@safe viper ☝️

desert oar Sep 11, 2021, 12:20 AM

#

lapis sequoia Hii, im new to python discord chanel. I have facing issue to extract stocks data...

Pretty much every major database has a library in python

#

So you'd just fetch the data you want and save it to the database

#

See #databases

lapis sequoia Sep 11, 2021, 12:24 AM

#

Okk thanx .. the data is huge. I mean all stocks from bloomberg api.. do you think postgres or mysql can handle this data ? Do you know other options ? Anyway i will ask this question in databse..

#

@desert oar

desert oar Sep 11, 2021, 12:32 AM

#

lapis sequoia Okk thanx .. the data is huge. I mean all stocks from bloomberg api.. do you thi...

Yes, databases can handle as much data can fit on your disk

lapis sequoia Sep 11, 2021, 12:33 AM

#

Okk..do you have any idea to solve this problem ?

#

My main issue is to select all stocks from bloomberg api.

#

Here is the code .. i mean im not sure ..

#

tickets = ['a', 'b',.....,'z']
blp.bdh( tickers=tickets flds=['Last_Price','EQY_FUND_CRNCY=CAD'], start_date='2020-12-31', end_date='2021-05-31', Per='M', )

#

actually the tickets should be name of stock.. but the name is always with alphabet order..

#

@desert oar

hidden eagle Sep 11, 2021, 1:14 AM

#

storm hill what do you want to do afterwards ? u need it for neural net ?

Yes, it's for a gan network

glad mulch Sep 11, 2021, 2:18 AM

#

lapis sequoia tickets = ['a', 'b',.....,'z'] blp.bdh( tickers=tickets flds=['Last_Price','EQY_...

i have used bloomberg before

#

whats your problem

hasty grail Sep 11, 2021, 3:09 AM

#

hidden eagle Yes, it's for a gan network

You should use libraries such as OpenCV or PIL to open images. Deep learning libraries may also have such methods built in

hidden eagle Sep 11, 2021, 3:10 AM

#

I

hidden eagle Sep 11, 2021, 3:10 AM

#

hasty grail You should use libraries such as OpenCV or PIL to open images. Deep learning lib...

I'm using PIL.

hasty grail Sep 11, 2021, 3:10 AM

#

You could just use PIL.Image.open instead of the builtin open method then

hidden eagle Sep 11, 2021, 3:11 AM

#

kk, let me try that and harrass you later. Thanks mate!

hasty grail Sep 11, 2021, 3:12 AM

#

np

hidden eagle Sep 11, 2021, 3:14 AM

#

hasty grail np

I need to ask one question, I'm on the software side (higher level) so it doesn't make a lot of sense, can you give me what exactly is going on from a computer point of view?

#

Is the represented vector here a "direction" in what the bytes are?

hasty grail Sep 11, 2021, 3:15 AM

#

Without looking at your JSON, idk what your "vectors" are

tender hearth Sep 11, 2021, 3:16 AM

#

are you creating a vector of the rows of pixels in your image?

hidden eagle Sep 11, 2021, 3:16 AM

#

hasty grail Without looking at your JSON, idk what your "vectors" are

I don't have the json I have the DB entry:

hidden eagle Sep 11, 2021, 3:17 AM

#

tender hearth are you creating a vector of the rows of pixels in your image?

From what I understand is the image (byte array) is turning into a vector?

#

This is what I need help understanding, is how Tensorflow "views" a byte array / vector of data.

tender hearth Sep 11, 2021, 3:18 AM

#

from a computer science perspective a vector is just a fixed sized array

hidden eagle Sep 11, 2021, 3:19 AM

#

tender hearth from a computer science perspective a vector is just a fixed sized array

Yes, but what does it represent?

#

Because a image is a byte array.

#

Means something, that I don't understand.

#

I understand it's an array of values, but I just don't know what they are / how to convert a image to those values 😄

desert oar Sep 11, 2021, 3:26 AM

#

@hidden eagle you might need to turn the PIL image data into a numpy array of RGB(A) pixels

hidden eagle Sep 11, 2021, 3:26 AM

#

kk, I'll give that a try.

desert oar Sep 11, 2021, 3:26 AM

#

That or these already are RGBA pixels

#

Note that "vector" is an overloaded term

#

In math it's best to think of a "vector" as a collection of numbers that live in a "space" with a coordinate system

#

And even that very abstract idea is fudging the real version

hasty grail Sep 11, 2021, 3:31 AM

#

Can you describe what do 'label1' and 'vector1' (as queried from your database) represent? I am still confused.

#

An image is represented by an array of pixels

#

The pixel values are normally either encoded as byte values (0-255) or floats (0-1)

hidden eagle Sep 11, 2021, 3:44 AM

#

No I understand what a vector is, I'm asking what is a vector in this case, not the literal meaning of a vector here's the code:

 const image1 = await knex.from('image').where({ key:key1 }).first()
        const image2 = await knex.from('image').where({ key:key2 }).first()

        if (image1.size != image2.size) {
            return res.status(400).send('Cannot mix images of differnet sizes.')
        }
        const url = (image1.size == 128) ? secrets.ganurl128 : secrets.ganurl256
        const [ imgs, vectors, labels ] = await request({
            url: url+'/mix_images',
            method: 'POST',
            json: true,
            form: {
                label1: JSON.stringify(image1.label),
                label2: JSON.stringify(image2.label),
                vector1: JSON.stringify(image1.vector),
                vector2: JSON.stringify(image2.vector)
            }
        })

#

That's the API that handles the request.

#

and how I'm handling / trying to handle it:

       img = PIL.Image.open("test1.jpeg")
        img.load()
        img2 = PIL.Image.open("test2.jpeg")
        img2.load()
        # label1 = np.asarray(json.loads(request.form['label1']), dtype='float64')
        # label2 = np.asarray(json.loads(request.form['label2']), dtype='float64')
        label1 = np.asarray(img, dtype='float64')
        label2 = np.asarray(img2, dtype='float64')
        vector1 = np.asarray(json.loads(request.form['vector1']), dtype='float64')
        vector2 = np.asarray(json.loads(request.form['vector2']), dtype='float64')

#

https://github.com/joel-simon/ganbreeder

GitHub

GitHub - joel-simon/ganbreeder: Breed and share images using biggan

Breed and share images using biggan. Contribute to joel-simon/ganbreeder development by creating an account on GitHub.

hidden eagle Sep 11, 2021, 3:46 AM

#

hasty grail Can you describe what do 'label1' and 'vector1' (as queried from your database) ...

The image that I posted of the db shows what the label and vectors are, I'm literally asking the same thing 😄

hasty grail Sep 11, 2021, 3:57 AM

#

Oh, I thought you had knowledge of the contents of the database you're accessing

hidden eagle Sep 11, 2021, 3:57 AM

#

I do, but it's been converted with the np.asarray.

#

I think the confusion here is I don't really understand what the data is.

#

A image is a byte array, from what I understand looking at the code is it's being converted to a vector (some other form of data) that I don't understand / know.

hasty grail Sep 11, 2021, 3:59 AM

#

What is the original source of the data?

#

(Before it is put into the database)

hidden eagle Sep 11, 2021, 3:59 AM

#

 const ids = await knex('image').insert(insert).returning('id')
    if (secrets.local_images) {
        for (let i = 0; i < imgs.length; i++) {
            const buf = new Buffer(imgs[i].replace(/^data:image\/\w+;base64,/, ""),'base64')
            fs.writeFileSync("public/img/"+insert[i].key+".jpeg", buf);
        }
    } 

// dont worry about this this is the s3 save 

else {
        const uploads = []
        for (let i = 0; i < imgs.length; i++) {
            const buf = new Buffer(imgs[i].replace(/^data:image\/\w+;base64,/, ""),'base64')
            uploads.push(s3.upload({
                Bucket: bucket,
                Key: `imgs/${insert[i].key}.jpeg`,
                Body: buf,
                ACL: 'public-read',
                ContentEncoding: 'base64',
                ContentType: 'image/jpeg'
            }).promise())
        }
        await Promise.all(uploads)
    }
    return insert.map(({ key }) => ({ key }))
}

hidden eagle Sep 11, 2021, 4:00 AM

#

hasty grail (Before it is put into the database)

const buf = new Buffer(imgs[i].replace(/^data:image\/\w+;base64,/, ""),'base64')

#

The api is sent a file that it handles

#

It saves the file as a jpg and stores the vector to the db

hasty grail Sep 11, 2021, 4:01 AM

#

I assume that the API backend* is where the 'label' and 'vector' are generated

hidden eagle Sep 11, 2021, 4:02 AM

#

hasty grail I assume that the API backend* is where the 'label' and 'vector' are generated


app.post('/mix_images', async (req, res) => {
    const key1 = req.body.key1
    const key2 = req.body.key2
    if (!key1 || !key2) return res.sendStatus(400)
    try {
        const image1 = await knex.from('image').where({ key:key1 }).first()
        const image2 = await knex.from('image').where({ key:key2 }).first()

        if (image1.size != image2.size) {
            return res.status(400).send('Cannot mix images of differnet sizes.')
        }
        const url = (image1.size == 128) ? secrets.ganurl128 : secrets.ganurl256
        const [ imgs, vectors, labels ] = await request({
            url: url+'/mix_images',
            method: 'POST',
            json: true,
            form: {
                label1: JSON.stringify(image1.label),
                label2: JSON.stringify(image2.label),
                vector1: JSON.stringify(image1.vector),
                vector2: JSON.stringify(image2.vector)
            }
        })
        const children = await save_results({ imgs, vectors, labels,
                                              size: image1.size,
                                              parent1: image1.id,
                                              parent2: image2.id })
        return res.json(children)
    } catch(err) {
        console.log('Error: /mix', err)
        return res.sendStatus(500)
    }
})

hasty grail Sep 11, 2021, 4:03 AM

#

Need to know how exactly are image.label and image.vector generated

#

If you can't access the internals and there is no public documentation for it, well...

lapis sequoia Sep 11, 2021, 4:10 AM

#

is it possible to get sweet randal out of the html text <a class="linkFriend" href="https://steamcommunity.com/id/bepri">sweet randal</a>

hidden eagle Sep 11, 2021, 4:10 AM

#

hasty grail Need to know how exactly are `image.label` and `image.vector` generated

 const image1 = await knex.from('image').where({ key:key1 }).first()
        const image2 = await knex.from('image').where({ key:key2 }).first()

        if (image1.size != image2.size) {
            return res.status(400).send('Cannot mix images of differnet sizes.')
        }
        const url = (image1.size == 128) ? secrets.ganurl128 : secrets.ganurl256
        const [ imgs, vectors, labels ] = await request({
            url: url+'/mix_images',
            method: 'POST',
            json: true,
            form: {
                label1: JSON.stringify(image1.label),
                label2: JSON.stringify(image2.label),
                vector1: JSON.stringify(image1.vector),
                vector2: JSON.stringify(image2.vector)
            }
        })```

#

That's how they are generated.

#

And the save above.

hidden eagle Sep 11, 2021, 4:11 AM

#

lapis sequoia is it possible to get sweet randal out of the html text ```<a class="linkFriend"...

Beautiful soup.

lapis sequoia Sep 11, 2021, 4:11 AM

#

I tried but i've only gotten to

#

def extract_members(self, response):
        for href in response.css('.linkFriend::attr(href)'):
            yield { 'href': href.extract() }```

hasty grail Sep 11, 2021, 4:11 AM

#

hidden eagle That's how they are generated.

That's just querying the database

lapis sequoia Sep 11, 2021, 4:11 AM

#

and that just saves the link of their profile in the href

#

@hidden eagle

hidden eagle Sep 11, 2021, 4:11 AM

#

hasty grail That's just querying the database

Yes and this is how it saves to the db

 const ids = await knex('image').insert(insert).returning('id')
    if (secrets.local_images) {
        for (let i = 0; i < imgs.length; i++) {
            const buf = new Buffer(imgs[i].replace(/^data:image\/\w+;base64,/, ""),'base64')
            fs.writeFileSync("public/img/"+insert[i].key+".jpeg", buf);
        }
    }

#

It's a buffer array

lapis sequoia Sep 11, 2021, 4:12 AM

#

thats js not python

hidden eagle Sep 11, 2021, 4:12 AM

#

lapis sequoia thats js not python

I know.

hasty grail Sep 11, 2021, 4:12 AM

#

I don't see the strings 'label' and 'vector'

hidden eagle Sep 11, 2021, 4:13 AM

#

hasty grail I don't see the strings 'label' and 'vector'

module.exports = async function({ imgs, vectors, labels, size, parent1=null, parent2=null}) {
    const insert = []
    for (var i = 0; i < imgs.length; i++) {
        insert.push({
            parent1, parent2, size,
            key: randomString(12),
            label: labels[i],
            vector: vectors[i],
        })
    }

This is js too

hasty grail Sep 11, 2021, 4:14 AM

#

You need to trace where 'vectors' comes from when the function is called

hidden eagle Sep 11, 2021, 4:15 AM

#

Basically the js api sends the data to the db and the python is handling the data with TF

#

async function main() {
    const size = 256
    const [ imgs, vectors, labels ] = await request({
        url: secrets.ganurl256+'/random',
        method: 'POST',
        json: true,
        form: { num: '24' }
    })
    await save_results({ imgs, vectors, labels, size })
    console.log('all done')
}

hasty grail Sep 11, 2021, 4:20 AM

#

I suspect that the vector is the seed passed to the GAN

#

i.e the noise vector

hidden eagle Sep 11, 2021, 4:21 AM

#

hasty grail I suspect that the vector is the seed passed to the GAN

half correct in the idea that the seed is also the image.

#

"noise vector" how does that work from a data sci explanation?

#

Is that the weight of a pixel?

hasty grail Sep 11, 2021, 4:23 AM

#

The training loop begins with generator receiving a random seed as input. That seed is used to produce an image. The discriminator is then used to classify real images (drawn from the training set) and fakes images (produced by the generator). The loss is calculated for each of these models, and the gradients are used to update the generator and discriminator.

From TensorFlow's GAN tutorial

hidden eagle Sep 11, 2021, 4:25 AM

#

Basically:

app.post('/mix_images', async (req, res) => {
    const key1 = req.body.key1
    const key2 = req.body.key2

that's pulled from:

form.addEventListener('submit', event => {
    event.preventDefault()
    const parent1 = form.querySelector('.parent1').value
    const parent2 = form.querySelector('.parent2').value
    let key1, key2
    try {
        key1 = get_key(parent1)
        key2 = get_key(parent2)
    } catch(e) {
        alert('URL not valid.')
        console.log(e)
        return
    }

and

   form.mixinput(style='text-align:center;')
            input.parent1(type='text', placeholder='Enter url', required)
            input.parent2(type='text', placeholder='Enter url', required)

            br
            input.submit(type='submit', value='Submit')

hasty grail Sep 11, 2021, 4:25 AM

#

You should take a look at the 'random' endpoint

hidden eagle Sep 11, 2021, 4:26 AM

#

That actually does do a noise function, you are correct.

hasty grail Sep 11, 2021, 4:26 AM

#

Since vectors is returned from the POST request sent there

hidden eagle Sep 11, 2021, 4:26 AM

#

"Noise" being the model data:

async function main() {
    const size = 256
    const [ imgs, vectors, labels ] = await request({
        url: secrets.ganurl256+'/random',
        method: 'POST',
        json: true,
        form: { num: '24' }
    })
    await save_results({ imgs, vectors, labels, size })
    console.log('all done')
}

hidden eagle Sep 11, 2021, 4:26 AM

#

hasty grail Since `vectors` is returned from the POST request sent there

That's how I generated the database btw

#

By calling the random endpoint.

#

docker-compose exec server node make_randoms.js

arctic wedgeBOT Sep 11, 2021, 5:57 AM

#

:incoming_envelope: :ok_hand: applied mute to @lapis sequoia until <t:1631340444:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).

boreal wasp Sep 11, 2021, 10:20 AM

#

Hi I have a question...

#

#

I'm using sqlalchemy and created a table called TweetWordPair

#

The table include 2 columns, "tweet_id" and "word"

#

But I want to display another column "ftd" using query after selecting values

#

I want to show the result of the values in ftd

#

but I get the add_columns error...

#

anyone has any suggestion? not creating a new column in the table

#

from sqlalchemy.orm import sessionmaker
from sqlalchemy import func

Session = sessionmaker(bind=engine)
session = Session()

ftd = session.query.add_columns(func.count(TweetWordPair.words).label("ftd")).first()
result = session.query(TweetWordPair.tweet_id, TweetWordPair.word, ftd).group_by(TweetWordPair.tweet_id).all()
for r in result:
    print(r)

lapis sequoia Sep 11, 2021, 11:32 AM

#

glad mulch whats your problem

My issue is that i want to extract all stocks from bloomberg. I dont have idea how can i do it? Im using blpapi. My goal is that i want fetch all stocks data into database like mysql or postgresql.. any idea

agile jolt Sep 11, 2021, 12:50 PM

#

Can someone explain me calculating outliers with st.dev
Just the theory part, no code needed

#

To be more specific I was given a thesis:
''We can calculate outliers in 2 ways: 1. with IQR, 2. with +/- 3 st.dev based on normal distribution. Explain!"

#

Basically, I need just the second way

prime hearth Sep 11, 2021, 12:58 PM

#

i think +/- is just #1

agile jolt Sep 11, 2021, 1:02 PM

#

Wdym, like it's self explanatory?

#

Nvm, i figured it out

harsh bear Sep 11, 2021, 1:09 PM

#

So i have a vps, where im plotting some data of my bot on a graph. problem is, its deployed on a vps. How can i see the graph live on my pc?

lapis sequoia Sep 11, 2021, 2:28 PM

#

What would be required to build the worlds best generall chatbot? With long term memory and learn about you and remembers everything. So basically like the movie “Her”

quiet vault Sep 11, 2021, 2:54 PM

#

Does anyone have code for Bayesian Optimization for time series data with deep learning?

desert oar Sep 11, 2021, 3:38 PM

#

agile jolt To be more specific I was given a thesis: ''We can calculate outliers in 2 ways:...

They are defining an outlier as three standard deviations away from the main, assuming the data is normally distributed

harsh bear Sep 11, 2021, 4:30 PM

#

harsh bear So i have a vps, where im plotting some data of my bot on a graph. problem is, i...

..

muted falcon Sep 11, 2021, 5:59 PM

#

Hey guys,
Someone I know made a course and uploaded it on his channel:
https://www.youtube.com/channel/UC-d9OPv09mS0lNkbVIb5WBg
Let's support him!

He has a course on Deep Learning with pytorch covering theory and implementation

YouTube

Omar Zayed

misty flint Sep 11, 2021, 7:04 PM

#

i recommend alexey grigorev

#

he also has an online course and a bunch of stuff on github

#

if we're doing rec's

royal crest Sep 12, 2021, 1:24 AM

#

!rule 6

arctic wedgeBOT Sep 12, 2021, 1:24 AM

#

Rules

6. Do not post unapproved advertising.

rain wadi Sep 12, 2021, 6:32 AM

#

just started out with neural networks and i was wondering something about the opencv library

#

i wanted to make my own computer vision library just for playing around

#

but i cant figure out how opencv understands the "x y" coordinates of an object on the camere/screen

#

for example i know how to make the neural network know if theres a pattern/object or not in the image/frame but i cant figure out how to understand its position on the screen

#

should i train the model with different positions for the object?

#

or theres a more smart way im missing?

#

if anyone can suggest an article that might be able to explain this. i would be happy to read it

near spindle Sep 12, 2021, 7:27 AM

#

Is it worth to buy a bundle of ML related books on Humble Bundle?

lapis sequoia Sep 12, 2021, 8:02 AM

#

near spindle Is it worth to buy a bundle of ML related books on Humble Bundle?

never came across any of their books so don't know if i can recommend it or not, but usually manning and some O’Reilly books are good

#

the first thing is if you'll actually read them; second is if they are written well and lastly if you'l lactually learn from them

near spindle Sep 12, 2021, 8:06 AM

#

lapis sequoia the first thing is if you'll actually read them; second is if they are written w...

I think so, depends on my time
I'd like to know that too
Probably yes since I haven't read any ML related stuff since I'm not in college yet but I want to prepare and learn something

lapis sequoia Sep 12, 2021, 8:08 AM

#

Then I'd recommend these books:

Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython 2nd Edition
Introduction to Machine Learning with Python: A Guide for Data Scientists
Deep Learning with Python, Second Edition

lapis sequoia Sep 12, 2021, 8:08 AM

#

near spindle 1. I think so, depends on my time 2. I'd like to know that too 3. Probably yes s...

those books are quite good imo; obviously you can find them online

near spindle Sep 12, 2021, 8:09 AM

#

I'm looking rn for any fragments to see how well or badly it's written

near spindle Sep 12, 2021, 8:43 AM

#

They seem to be quite well written from what I saw

#

I think I'll buy them

quick relic Sep 12, 2021, 10:37 AM

#

Hi, I am trying to train my CNN, but the loss does not decrease and the test results are always the same

#

    def train_loop(self):

        for batch, (X,y) in tqdm(enumerate(self.train_dataloader), desc="TRAINING"):
            X = X.to(self.device)
            y = y.to(self.device)

            self.optimizer.zero_grad()
            pred = self.model(X)
            loss = self.loss_fn(pred, y)

            loss.backward()
            self.optimizer.step()
        print(f"Result\nloss: {loss.item()}")

#

    def test_loop(self):
        size = len(self.test_dataloader.dataset)
        num_batches = len(self.test_dataloader)
        test_loss = 0
        correct = 0

        with torch.no_grad():
            for batch, (X,y) in tqdm(enumerate(self.test_dataloader), desc="TESTING"):
                X = X.to(self.device)
                y = y.to(self.device)

                pred = self.model(X)
                test_loss += self.loss_fn(pred, y).item()
                correct += (pred.argmax(1) == y).type(torch.float).sum().item()

        test_loss /= num_batches
        correct /= size
        print(f"Result\nAccuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f}")

#

            self.model = Network().to(self.device)
            self.data_train = ImageSet(img_path, annotations_file_name, transform=ToTensor(), train=True)
            self.data_test = ImageSet(img_path, annotations_file_name, transform=ToTensor(), train=False)

            self.train_dataloader = torch.utils.data.DataLoader(self.data_train, batch_size=32, shuffle=True)
            self.test_dataloader = torch.utils.data.DataLoader(self.data_test, batch_size=32, shuffle=True)
            #self.test_dataloader = torch.utils.data.DataLoader(self.data_train, batch_size=16, shuffle=True)
        
            #training params
            self.epochs = 10
            self.learning_rate = 1e-2
            self.loss_fn = nn.CrossEntropyLoss()
            self.optimizer = torch.optim.Adam(self.model.parameters(), lr=self.learning_rate)

#

#

I assume there is some stupid mistake in the train function but I cant spot it

hoary wigeon Sep 12, 2021, 11:12 AM

#

help me!

#

I have created a model using Decision Tree algorithm, over an imbalanced data

#

#

how can i make decision tree to ignore few points, like svm and logistic

velvet thorn Sep 12, 2021, 11:29 AM

#

hoary wigeon how can i make decision tree to ignore few points, like svm and logistic

what do you mean

#

ignore points

hoary wigeon Sep 12, 2021, 11:30 AM

#

like in svc, few points are mis classified

#

so we have a parameter C in svc

velvet thorn Sep 12, 2021, 11:31 AM

#

hoary wigeon so we have a parameter C in svc

ah, you're talking about regularisation strength

#

normally, to decrease variance in decision trees, some sort of stacking is used

hoary wigeon Sep 12, 2021, 11:31 AM

#

i thought it is a critical point

velvet thorn Sep 12, 2021, 11:32 AM

#

hoary wigeon i thought it is a critical point

can you elaborate on this

hoary wigeon Sep 12, 2021, 11:32 AM

#

velvet thorn can you elaborate on this

im saying about C

velvet thorn Sep 12, 2021, 11:32 AM

#

hoary wigeon im saying about `C`

what about it

hoary wigeon Sep 12, 2021, 11:32 AM

#

hold on

#

velvet thorn Sep 12, 2021, 11:33 AM

#

velvet thorn ah, you're talking about regularisation strength

yes, isn't this what I said

hoary wigeon Sep 12, 2021, 11:33 AM

#

yeah it is an regularization param

#

is there something like this in decision tree ?

velvet thorn Sep 12, 2021, 11:33 AM

#

hoary wigeon yeah it is an regularization param

do you know how it works?

velvet thorn Sep 12, 2021, 11:33 AM

#

hoary wigeon is there something like this in decision tree ?

not really.

#

I mean

#

you can change the max depth of the tree

#

or how it splits

hoary wigeon Sep 12, 2021, 11:33 AM

#

yeah i know about svc and logistic

velvet thorn Sep 12, 2021, 11:33 AM

#

but they're fundamentally different models

hoary wigeon Sep 12, 2021, 11:34 AM

#

yep

velvet thorn Sep 12, 2021, 11:34 AM

#

hoary wigeon yeah i know about svc and logistic

no, I mean about the mathematical nature of support vector machines

hoary wigeon Sep 12, 2021, 11:34 AM

#

yeah bruh..

#

i read all ituition

velvet thorn Sep 12, 2021, 11:34 AM

#

okay, then you should understand what I'm saying 🙂

hoary wigeon Sep 12, 2021, 11:34 AM

#

sure

#

#

so how can i fix this ?

velvet thorn Sep 12, 2021, 11:36 AM

#

velvet thorn normally, to decrease variance in decision trees, some sort of stacking is used

see this

velvet thorn Sep 12, 2021, 11:36 AM

#

velvet thorn you can change the max depth of the tree

or this

velvet thorn Sep 12, 2021, 11:36 AM

#

velvet thorn or how it splits

or this

hoary wigeon Sep 12, 2021, 11:36 AM

#

i tried with grid search cv

#

using cv as Stratified

lapis sequoia Sep 12, 2021, 11:37 AM

#

precision and recall of 0 is 0 that means it's always classifying as 1. which i would say is quite bad regardless of accuracy of 74. try increasing depth? i feel this less rules may not be enough for the tree.

hoary wigeon Sep 12, 2021, 11:38 AM

#

theres no change in accuracy after max_depth 3++

lapis sequoia Sep 12, 2021, 11:39 AM

#

i mean increase max depth. say 5?

hoary wigeon Sep 12, 2021, 11:39 AM

#

#

after changing clas weight

lapis sequoia Sep 12, 2021, 11:40 AM

#

I'm saying increase depth🤕

hoary wigeon Sep 12, 2021, 11:40 AM

#

hoary wigeon Sep 12, 2021, 11:40 AM

#

hoary wigeon theres no change in accuracy after max_depth 3++

i told ya

lapis sequoia Sep 12, 2021, 11:41 AM

#

weights 2 1?

hoary wigeon Sep 12, 2021, 11:42 AM

#

cool

#

lapis sequoia Sep 12, 2021, 11:42 AM

#

I'd say this is better.

hoary wigeon Sep 12, 2021, 11:42 AM

#

yeah

lapis sequoia Sep 12, 2021, 11:42 AM

#

than that 75

hoary wigeon Sep 12, 2021, 11:42 AM

#

atleast model is supporting both classes

#

how do you calculate weight

lapis sequoia Sep 12, 2021, 11:44 AM

#

nice question. in IR system I'd somehow make a framework of AW = y
since we do have y and do inversion but not sure about this one.

hoary wigeon Sep 12, 2021, 11:48 AM

#

i dint got this

lapis sequoia Sep 12, 2021, 11:49 AM

#

class_weightdict, list of dict or “balanced”, default=None

Weights associated with classes in the form {class_label: weight}. If None, all classes are supposed to have weight one. For multi-output problems, a list of dicts can be provided in the same order as the columns of y.

Note that for multioutput (including multilabel) weights should be defined for each class of every column in its own dict. For example, for four-class multilabel classification weights should be [{0: 1, 1: 1}, {0: 1, 1: 5}, {0: 1, 1: 1}, {0: 1, 1: 1}] instead of [{1:1}, {2:5}, {3:1}, {4:1}].

#

is yours multilabel?

#

i think no. then shouldn't it be like below.

hoary wigeon Sep 12, 2021, 11:50 AM

#

lapis sequoia is yours multilabel?

nope

lapis sequoia Sep 12, 2021, 11:50 AM

#

why not just remove weights for now? what accuracy does it give?

hoary wigeon Sep 12, 2021, 11:51 AM

#

lapis sequoia Sep 12, 2021, 11:52 AM

#

seems better.

hoary wigeon Sep 12, 2021, 11:52 AM

#

i guess it is changing because of train test split

#

previous results were totally worst

lapis sequoia Sep 12, 2021, 11:53 AM

#

we also have depth of 5 rn.

hoary wigeon Sep 12, 2021, 11:53 AM

#

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=1, shuffle=True, stratify=y) # Splitting data```

zenith crow Sep 12, 2021, 12:38 PM

#

hello

#

pls help

#

my python is pretty basic

#

here

#

five_info.append("Input=")

#

is there a way i can append "Input" to a the first list

#

five_info = [[],[]]

#

five_info.append("Input=",[1])

#

it doesn't work

serene scaffold Sep 12, 2021, 1:20 PM

#

@zenith crow hello, this is not a data science question

#

See #❓｜how-to-get-help

zenith crow Sep 12, 2021, 1:21 PM

#

sry

light warren Sep 12, 2021, 2:14 PM

#

hey, i am trying to run this code that i found on my school help sheet but i keep getting this error,

#

https://gyazo.com/1e1c0d40a55fd1900918f4a18489962e

Gyazo

prime hearth Sep 12, 2021, 2:15 PM

#

i guess would it be okay to ask

#

have you broken down

#

the .methods

#

so portfolio.sample()

#

does that successfuly compile

#

if it doesnt then thats where the error is being due to invalid argument

light warren Sep 12, 2021, 2:21 PM

#

https://gyazo.com/e040430800077a5382b7c59a3aea5421

Gyazo

#

same error

prime hearth Sep 12, 2021, 2:24 PM

#

Yes

#

so thats the error

#

its the argument that is being passed

#

i dont know much about time and date

#

but can try looking (googling) the error

#

and that should answer the problem

light warren Sep 12, 2021, 2:25 PM

#

shall i remove the date column, every other column is float64 and that one is object

prime hearth Sep 12, 2021, 2:25 PM

#

im not sure, that depends on what the error means from quick google search and the solutions offered from stack over flow

lapis sequoia Sep 12, 2021, 2:30 PM

#

there's one value which is coming has zero in the batch. How to see what is there in the batch? can anyone help please

#

serene scaffold Sep 12, 2021, 2:32 PM

#

lapis sequoia there's one value which is coming has zero in the batch. How to see what is ther...

Can you provide all the code as text?

#

!paste

arctic wedgeBOT Sep 12, 2021, 2:32 PM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pydis.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

hasty grail Sep 12, 2021, 2:33 PM

#

lapis sequoia there's one value which is coming has zero in the batch. How to see what is ther...

Use torch.nonzero to find the index of the batch

lapis sequoia Sep 12, 2021, 2:34 PM

#

serene scaffold Can you provide all the code as text?

model.eval()
print("entering validation")
data_iterator = tqdm(loader, leave=False, unit="batch", disable=silent)
losses = []
rl_loss = []
kl_loss = []
counts = []
for index, batch in enumerate(data_iterator):
batch = batch[0]
if cuda:
batch = batch.cuda(non_blocking=True)
recon, mean, logvar = model(batch)
rl,kl,loss =model.loss(batch, recon, mean, logvar)
losses.append(loss.detach().cpu())
rl_loss.append(rl.detach().cpu())
kl_loss.append(kl.detach().cpu())
# losses.append(model.loss(batch, recon, mean, logvar).detach().cpu())
batch_sum =batch.sum(1).detach().cpu()
# b=(batch_sum ==0)
# if torch.sum(b):
# a=0
counts.append(batch_sum)
# dimon = torch.cat(counts).mean().exp().item()
# nomi = torch.cat(losses)
return float((torch.cat(losses) / torch.cat(counts)).mean().exp().item())

serene scaffold Sep 12, 2021, 2:34 PM

#

lapis sequoia model.eval() print("entering validation") data_iterator = tqdm(loader, l...

This has a return statement that isn't part of a function. Try putting the whole thing in the paste bin, per my earlier message.

lapis sequoia Sep 12, 2021, 2:35 PM

#

hasty grail Use `torch.nonzero` to find the index of the batch

i already found the index, as shown in my screenshot. but my question is, how do i know what is there in that

hasty grail Sep 12, 2021, 2:36 PM

#

Oh, sorry didn't catch that

#

I guess you can iterate through your data until the index is equal to that number

#

and inspect the corresponding data sample

lapis sequoia Sep 12, 2021, 2:36 PM

#

im not getting how do i do that

lapis sequoia Sep 12, 2021, 2:37 PM

#

serene scaffold This has a `return` statement that isn't part of a function. Try putting the who...

whats a paste bin? sorry im not aware of it.

serene scaffold Sep 12, 2021, 2:37 PM

#

lapis sequoia whats a paste bin? sorry im not aware of it.

!paste

arctic wedgeBOT Sep 12, 2021, 2:37 PM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pydis.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

hasty grail Sep 12, 2021, 2:37 PM

#

for index, batch in enumerate(data_iterator):
    if index == empty_batch_index:
        # Do something with batch

serene scaffold Sep 12, 2021, 2:37 PM

#

^ click the link

lapis sequoia Sep 12, 2021, 2:38 PM

#

serene scaffold ^ click the link

i pasted the code and saved it.

serene scaffold Sep 12, 2021, 2:38 PM

#

lapis sequoia i pasted the code and saved it.

You have to share the URL for it

lapis sequoia Sep 12, 2021, 2:39 PM

#

https://paste.pythondiscord.com/nubizovepa.properties

serene scaffold Sep 12, 2021, 2:39 PM

#

lapis sequoia https://paste.pythondiscord.com/nubizovepa.properties

there's no way this is the whole thing as you have a return statement outside of a function. This would not run.

lapis sequoia Sep 12, 2021, 2:39 PM

#

inside the if condition, how can i get the data?

hasty grail Sep 12, 2021, 2:40 PM

#

wait, your index is the index within a batch

lapis sequoia Sep 12, 2021, 2:40 PM

#

yes

hasty grail Sep 12, 2021, 2:40 PM

#

in that case, batch[empty_batch_index] would be the data you need

lapis sequoia Sep 12, 2021, 2:50 PM

#

it gives me like this

#

batch[125]
tensor([0., 0., 0., ..., 0., 0., 0.])

#

but I want to see what is this tensor associated with in my data..

hasty grail Sep 12, 2021, 2:54 PM

#

you'll have to trace to your data source and iterate through it in the same way as you did here

#

then find the corresponding sample in the batch

lapis sequoia Sep 12, 2021, 2:55 PM

#

is it possible to tell how do i do this?

#

coz this is where im stuck at

#

unless i know what is causing the issue i cannot fix it

hasty grail Sep 12, 2021, 2:55 PM

#

You have control over the data iterator, right?

#

as in, it is constructed inside your code

lapis sequoia Sep 12, 2021, 2:56 PM

#

yes.

hasty grail Sep 12, 2021, 2:56 PM

#

ok so you need to go to whatever line that is

#

and figure out what data is being fed to it

#

apart from the index of the empty sample in the batch (i.e. sample index), also print out the batch index (index in the code you showed above)

#

then, you find the piece of data that corresponds to the batch index and the sample index

lapis sequoia Sep 12, 2021, 3:00 PM

#

ok i will try to search for some code to do that.... if you have something, would help too...

#

also, im using the torch dataloader.

hasty grail Sep 12, 2021, 3:01 PM

#

it would be helpful if you set up your own dummy dataloader that, for example, prints out the file source for the currently yielded batch

#

and iterate through the actual and dummy dataloader in parallel (using zip)

#

that way, when you get to that problematic batch, you can look at the console/log and identify the relevant files

lapis sequoia Sep 12, 2021, 3:03 PM

#

ah!! i shall do that then.

#

thank you so much 🙂

#

it was really helped a lot

hasty grail Sep 12, 2021, 3:03 PM

#

you're welcome 🙂

lapis sequoia Sep 12, 2021, 3:03 PM

#

also, there's no way we could do with the in built ah?

hasty grail Sep 12, 2021, 3:04 PM

#

I haven't worked with PyTorch all that much, so idk

lapis sequoia Sep 12, 2021, 3:04 PM

#

ah okok

#

thats ok, this was very helpful 🙂

serene scaffold Sep 12, 2021, 3:20 PM

#

I have this

                                    CRF               BiLSTM+CRF               BioBERT+CRF              
                                      P      R     F1          P      R     F1           P      R     F1
Animal   CellLine                 0.952  0.513  0.667      0.719  0.590  0.648       0.739  0.872  0.800
         GroupName                0.819  0.457  0.586      0.752  0.581  0.656       0.684  0.716  0.699
         GroupSize                0.835  0.662  0.739      0.764  0.706  0.734       0.723  0.861  0.786
         SampleSize               0.800  0.279  0.414      0.529  0.419  0.468       0.630  0.674  0.652
         Sex                      0.980  0.760  0.856      0.977  0.871  0.921       0.958  0.913  0.935
         Species                  0.982  0.844  0.907      0.978  0.893  0.933       0.952  0.917  0.934
         Strain                   0.917  0.742  0.820      0.865  0.825  0.845       0.832  0.863  0.847

I want to get a boolean dataframe where the best P, R, or F1 for each of the three algorithms is True, but idxmax doesn't support level=

serene scaffold Sep 12, 2021, 3:49 PM

#

df.groupby(level=0, axis=1).apply(lambda x: x.droplevel(axis='columns', level=0) == maxx) appears to be it.

#

annoying 😄

serene scaffold Sep 12, 2021, 4:27 PM

#

Now how can I make this one statement?

is_max = (
    df.groupby(level=0, axis=1)
        .apply(lambda d: 
            d.droplevel(axis=1, level=0)
            == 
            df.max(level=1, axis=1)
        )
)

df_str = df.applymap(lambda x: f'{x:.3f}')
df_str[is_max] = r'\textbf{' + df_str + '}'

print(df_str.to_latex(escape=False))

frosty flame Sep 12, 2021, 5:27 PM

#

hey

#

i need help with something

#

iam trying to import tensorflow_hub

#

but i keep getting this:

#

cannot import name 'parameter_server_strategy_v2' from 'tensorflow.python.distribute'

serene scaffold Sep 12, 2021, 6:08 PM

#

frosty flame cannot import name 'parameter_server_strategy_v2' from 'tensorflow.python.distri...

please share the whole error message and the code that caused the error message as text (no screenshots).

frosty flame Sep 12, 2021, 6:10 PM

#

can i send as a file, cuz i cant send large texts while iam not nitro

serene scaffold Sep 12, 2021, 6:10 PM

#

frosty flame can i send as a file, cuz i cant send large texts while iam not nitro

you can't send files in this server. Try this:

#

!paste

arctic wedgeBOT Sep 12, 2021, 6:10 PM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pydis.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

frosty flame Sep 12, 2021, 6:10 PM

#

!paste

serene scaffold Sep 12, 2021, 6:10 PM

#

You have to click the link from the bot message

frosty flame Sep 12, 2021, 6:11 PM

#

paste it there?

#

then what?

serene scaffold Sep 12, 2021, 6:11 PM

#

yes, then save and share the URL

frosty flame Sep 12, 2021, 6:11 PM

#

https://paste.pythondiscord.com/yijisikeha.py

#

there you go

#

this is the error

serene scaffold Sep 12, 2021, 6:12 PM

#

@frosty flame what tensorflow version are you using?

frosty flame Sep 12, 2021, 6:12 PM

#

latest? i downloaded it via anaconda

serene scaffold Sep 12, 2021, 6:13 PM

#

I've never used anaconda

#

pip install --upgrade tensorflow-estimator==2.3.0

Try that

#

but for anaconda

frosty flame Sep 12, 2021, 6:13 PM

#

then i tried it using pycharm, and the same issue happened

serene scaffold Sep 12, 2021, 6:13 PM

#

idk how to install stuff with anaconda.

frosty flame Sep 12, 2021, 6:13 PM

#

its spider

serene scaffold Sep 12, 2021, 6:13 PM

#

frosty flame then i tried it using pycharm, and the same issue happened

the editor you use doesn't affect this as the environment is managed by anaconda, not the editor.

frosty flame Sep 12, 2021, 6:13 PM

#

anaconda is like a hub

serene scaffold Sep 12, 2021, 6:14 PM

#

have you tried not using anaconda?

#

most people don't, so it's easier to get help that way.

frosty flame Sep 12, 2021, 6:14 PM

#

yes, and it was the same issue

serene scaffold Sep 12, 2021, 6:14 PM

#

well, try installing tensorflow-estimator==2.3.0 specifically

#

two pages told me that that is the solution

frosty flame Sep 12, 2021, 6:15 PM

#

i executed what you sent

#

and other issue appeared

serene scaffold Sep 12, 2021, 6:15 PM

#

what is the other issue?

frosty flame Sep 12, 2021, 6:16 PM

#

serene scaffold idk how to install stuff with anaconda.

https://paste.pythondiscord.com/poquvacuti.py

serene scaffold Sep 12, 2021, 6:16 PM

#

I see. what are you trying to do, exactly?

#

are you following a tutorial?

frosty flame Sep 12, 2021, 6:17 PM

#

trying to make ESP bot

frosty flame Sep 12, 2021, 6:17 PM

#

serene scaffold I see. what are you trying to do, exactly?

no not really

serene scaffold Sep 12, 2021, 6:17 PM

#

can you help me understand how you ended up at this problem?

frosty flame Sep 12, 2021, 6:18 PM

#

its just importing the library what sends this error

serene scaffold Sep 12, 2021, 6:18 PM

#

what OS are you on?

frosty flame Sep 12, 2021, 6:18 PM

#

Windows 10 pro

serene scaffold Sep 12, 2021, 6:19 PM

#

what python version?

frosty flame Sep 12, 2021, 6:19 PM

#

python 3.8

serene scaffold Sep 12, 2021, 6:20 PM

#

I'm trying to get it working locally

#

I'm going to go do a thing while it installs

frosty flame Sep 12, 2021, 6:21 PM

#

ok thanks

serene scaffold Sep 12, 2021, 6:28 PM

#

frosty flame ok thanks

pip install tensorflow_hub
pip install tensorflow
python -c "import tensorflow_hub; print('Done!')"

#

worked for me

#

though I'm on linux, so you may have to install the build tools

#

!build

arctic wedgeBOT Sep 12, 2021, 6:28 PM

#

Microsoft Visual C++ Build Tools

When you install a library through pip on Windows, sometimes you may encounter this error:

error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/

This means the library you're installing has code written in other languages and needs additional tools to install. To install these tools, follow the following steps: (Requires 6GB+ disk space)

1. Open https://visualstudio.microsoft.com/visual-cpp-build-tools/.
2. Click Download Build Tools >. A file named vs_BuildTools or vs_BuildTools.exe should start downloading. If no downloads start after a few seconds, click click here to retry.
3. Run the downloaded file. Click Continue to proceed.
4. Choose C++ build tools and press Install. You may need a reboot after the installation.
5. Try installing the library via pip again.

frosty flame Sep 12, 2021, 6:29 PM

#

i already have build tools for intellig, do they work?

#

ERROR: Could not install packages due to an OSError: [WinError 5] Access is denied: 'D:\Apps\Anaconda\Lib\site-packages\~cipy\integrate\lsoda.cp38-win_amd64.pyd'

#

this happens when executed pip install tensorflow

frosty flame Sep 12, 2021, 6:32 PM

#

serene scaffold ```py pip install tensorflow_hub pip install tensorflow python -c "import tensor...

.

serene scaffold Sep 12, 2021, 6:33 PM

#

frosty flame i already have build tools for intellig, do they work?

I don't know

serene scaffold Sep 12, 2021, 6:33 PM

#

frosty flame ERROR: Could not install packages due to an OSError: [WinError 5] Access is deni...

It's still trying to use anaconda

frosty flame Sep 12, 2021, 6:34 PM

#

but i think it worked

#

no errors are apearing when running

#

thanks for helping

serene scaffold Sep 12, 2021, 6:34 PM

#

💚

frosty flame Sep 12, 2021, 6:35 PM

#

💚

boreal bear Sep 12, 2021, 8:20 PM

#

Hello all, I am looking at some survey data and I have filled my 'NaN' values with 'NO RESPONSE' but I want to count how values are 'NO RESPONSE' in the pd dataset. How can I do this? .isnull() won't catch the 'NO RESPONSE'

thin prism Sep 12, 2021, 8:24 PM

#

boreal bear Hello all, I am looking at some survey data and I have filled my 'NaN' values wi...

IDK if this correct, but you can try filtering characters "no response"

boreal bear Sep 12, 2021, 8:25 PM

#

I used .fillna() on the dataset but I guess i could also go back and remove that and count the NAs but that's a little annoying

thin prism Sep 12, 2021, 8:26 PM

#

yea im more like a oldschool coder so

boreal bear Sep 12, 2021, 8:26 PM

#

What's the easiest way to filter 'NO RESPONSE' across 42 columns?

thin prism Sep 12, 2021, 8:26 PM

#

i do hardcode stuff haha

#

are you using pd?

boreal bear Sep 12, 2021, 8:26 PM

#

yeah

thin prism Sep 12, 2021, 8:27 PM

#

try search

#

df.filter(regex=)

boreal bear Sep 12, 2021, 8:27 PM

#

ok will try. Thank you so much!

thin prism Sep 12, 2021, 8:27 PM

#

i really do hope im not wasting ytour time haha

boreal bear Sep 12, 2021, 8:27 PM

#

Of course not. It's a learning process.

#

And I thoroughly enjoy the process

thin prism Sep 12, 2021, 8:28 PM

#

let me know if that method works 🙂

boreal bear Sep 12, 2021, 8:28 PM

#

thanks will do

#

doesn't seem to be working

#

df_nores = df.filter(regex='NO RESPONSE', axis=1)

#

is this syntax correct?

rigid zodiac Sep 12, 2021, 8:31 PM

#

Quick question, how do you apply your deep learning model H5 to the new data

thin prism Sep 12, 2021, 8:34 PM

#

boreal bear is this syntax correct?

so

#

can i know the column title for the value that contains no response?

boreal bear Sep 12, 2021, 8:35 PM

#

thin prism can i know the column title for the value that contains no response?

It is in a lot of columns

#

there are a lot of null values

thin prism Sep 12, 2021, 8:35 PM

#

ohhhhh

#

so you have multiple null values inside the multiple columns

boreal bear Sep 12, 2021, 8:35 PM

#

130 to be exact!!

boreal bear Sep 12, 2021, 8:35 PM

#

thin prism so you have multiple null values inside the multiple columns

yes

thin prism Sep 12, 2021, 8:36 PM

#

in that case we can try

#

uhhh .str.contains('NO RESPONSE') on entire column

#

try search that method

#

hope that works

boreal bear Sep 12, 2021, 8:39 PM

#

Maybe I can try that and iterate through

thin prism Sep 12, 2021, 8:39 PM

#

mhm 🙂

lapis sequoia Sep 12, 2021, 8:51 PM

#

Hey, is there some machine learning specialist here that could tell me what kind of data quality I need to get what I want in my project?

#

Also would love to hear about what algorithm to use and so on

faint ravine Sep 12, 2021, 8:55 PM

#

lapis sequoia Hey, is there some machine learning specialist here that could tell me what kind...

That's a very broad question

lapis sequoia Sep 12, 2021, 9:09 PM

#

I know 😄

lapis sequoia Sep 12, 2021, 9:11 PM

#

faint ravine That's a very broad question

The aim of my project is to provide a user-specific, individualized presentation of product images, maximizing customer attraction and thereby increase conversion.
The data for this is collected from bot user interaction with the shop.
And just to provide you a brief overview:
At first, the customer is shown randomly selected product images from the backend.
Currently, the majority of the data consists of the correlation between the product image viewed and whether a purchase was subsequently made.
The selection of the best product image should then be based on some machine learning algorithm.

faint ravine Sep 12, 2021, 9:29 PM

#

lapis sequoia The aim of my project is to provide a user-specific, individualized presentation...

So you have lots of different product images for each product and you're trying to decide which one works the best for which user?

lapis sequoia Sep 12, 2021, 9:32 PM

#

exactly

#

and data is the image_url

#

do you think you can help with that? @faint ravine

prime hearth Sep 12, 2021, 9:47 PM

#

I am just a student

#

not sure if this might help

#

but I would try all algorithms

#

since this is part of the machine learning remodel trainning process

#

right after model trainning

#

Identify problem -> data collection -> data prep (feature engineering, feature selection...)- > explority analyse (visualizatino on disitrubtion etc) -> model trainning - > evaluate -> repeat

#

Some algorithms would be classifiers or CNN , and recommendatino system (content/collaborative)

faint ravine Sep 12, 2021, 9:56 PM

#

lapis sequoia do you think you can help with that? <@!265867977792946186>

I'm not sure how you collect your data. In all honesty, I don't think you need a machine learning model to decide which product image works best. My 2 cents:

Most people are similar, and if one image looks good for one person, it'll look good for anyone else.
You could probably use a machine learning model to create a recommendation system.
If you have lots of product images and want to decide which one has the most user engagement, you can randomly evenly show images for a number of users and extract a rating for each image. Over time, the one with the highest rating is the one that is displayed by default.

Again its really difficult for me to understand why you need an ML model to solve the problem of the "optimal image". You might have a good reason that I can't see..

lapis sequoia Sep 12, 2021, 10:02 PM

#

So by optimal image, I actually meant a recommendation system

#

Very good points, indeed

#

@faint ravine

#

is what I'm doing to collect data
Though I can't use production data, so I use bots for this

faint ravine Sep 12, 2021, 10:06 PM

#

Hold on. Let's say you have a shoe product, call it X. You have N images of product X and you're trying to find the one that works best for User P, right?

lapis sequoia Sep 12, 2021, 10:06 PM

#

exactly

faint ravine Sep 12, 2021, 10:08 PM

#

So the ML model is a function F that takes in: UserData, ProductID and outputs ProductImage.

wide rose Sep 12, 2021, 10:08 PM

#

oh

#

I read yesterday at

#

5

lapis sequoia Sep 12, 2021, 10:09 PM

#

Inside UserData there has to be some sort of previous purchases, metadata, etc.

faint ravine Sep 12, 2021, 10:09 PM

#

How many product images on average do you have per product? (You must have a lot)

lapis sequoia Sep 12, 2021, 10:10 PM

#

So for testing purposes, I could use 4-10 per product

#

Is this enough to make it work? Or would I need like 500

faint ravine Sep 12, 2021, 10:12 PM

#

Things you can learn from past purchases would be like colors, background color..etc.

#

You can extract features from images of past user's purchased products. And try to find the image of the recommended product that most resembles that..

lapis sequoia Sep 12, 2021, 10:12 PM

#

Theyre just different resolutions for now, so all we can work with is the different image url as a feature

faint ravine Sep 12, 2021, 10:13 PM

#

Not sure there's a whole lot of things you can find that are common between two entirely different product images..

lapis sequoia Sep 12, 2021, 10:14 PM

#

Yeah, it seems very hard

faint ravine Sep 12, 2021, 10:15 PM

#

For example, let's say User: Jack has purchased a shoe that has a red color in the past. And now he's browsing curtains and trying to purchase a curtain for his new room. You have 4 images of curtains with different colors. The best image to show him is the one of the red curtain

#

So this is just a guess. But color is something that can be an individualized users specific feature.

#

Can't think of anything else

lapis sequoia Sep 12, 2021, 10:18 PM

#

Yeah either it is super simple or gets way too complex for my time span - either way, currently, the majority of the data consists of the correlation between the product image viewed and whether a purchase was subsequently made.

#

I only got 2 weeks

faint ravine Sep 12, 2021, 10:18 PM

#

You could build

lapis sequoia Sep 12, 2021, 10:19 PM

#

So you would go with the different color approach?

faint ravine Sep 12, 2021, 10:21 PM

#

A simple model that takes in the exact images used in all past purchases. And use images of similar products or different product images as "false data". Now you have a function that tells you whether the user is gonna like some image or not.
Then feed this function all images of the selected product, and pick the one with the highest output..

lapis sequoia Sep 12, 2021, 10:22 PM

#

different product images being?

#

different on the same product?

faint ravine Sep 12, 2021, 10:23 PM

#

But again, there's a lot of edge cases. The model will make some random predictions that have nothing to do with whether the user likes the product or not.. Because of the small dataset.

I'd honestly use a more manual approach to this. Or target a set of very specific features like color. (You can get creative here).

lapis sequoia Sep 12, 2021, 10:24 PM

#

So what I could either do is to create a concept on how it could work

#

Or implement some manual approach, but I would like the first

#

Since I'm writing my first scientific working paper on it

faint ravine Sep 12, 2021, 10:25 PM

#

Yes, if you try to use a complex deep learning algorithm on this, it'll just be a big blackbox and return random predictions that you won't appreciate..

faint ravine Sep 12, 2021, 10:27 PM

#

lapis sequoia Since I'm writing my first scientific working paper on it

Well then, let your intuition guide you. Imagine you are the user. And you have purchased N products in the past. Based on that history, what's the best image out of a set S that best matches your taste.

#

Its a very complex problem. One that I don't think will have a generic solution but still, you could get creative.

lapis sequoia Sep 12, 2021, 10:28 PM

#

Yeah, do you think I can create 20 pages by the end of the month on this?

faint ravine Sep 12, 2021, 10:28 PM

#

20 pages of what?/

lapis sequoia Sep 12, 2021, 10:28 PM

#

Of paper

#

😄

faint ravine Sep 12, 2021, 10:29 PM

#

Not sure.

lapis sequoia Sep 12, 2021, 10:29 PM

#

So uhm I'm really thinking how I could utilize the data I currently have

faint ravine Sep 12, 2021, 10:29 PM

#

I mean, some complex papers are only 7 pages. Why the obsession with the number?

lapis sequoia Sep 12, 2021, 10:30 PM

#

its a requirement

#

20-30 pages

faint ravine Sep 12, 2021, 10:30 PM

#

Where do you study lol?

lapis sequoia Sep 12, 2021, 10:30 PM

#

Germany

faint ravine Sep 12, 2021, 10:30 PM

#

I doubt they're looking for something super original

#

Is this a masters thesis

lapis sequoia Sep 12, 2021, 10:30 PM

#

hmm yeah it is my first one anyways

#

nono

#

pre-bachelors

faint ravine Sep 12, 2021, 10:31 PM

#

So a bachelor's thesis right?

lapis sequoia Sep 12, 2021, 10:31 PM

#

literally my first one at uni, 2nd semester

faint ravine Sep 12, 2021, 10:31 PM

#

Oh

#

Youngster

#

xD

lapis sequoia Sep 12, 2021, 10:31 PM

#

haha ye

faint ravine Sep 12, 2021, 10:32 PM

#

You probably could yea.

#

Just put in lots of figures

lapis sequoia Sep 12, 2021, 10:32 PM

#

Is there anything I could do with the order history? Say if I make the bots place 100 different orders with like 3 products each

#

Then I got some sample data, like just to have something very basic stuff working, which could be colors etc. with the correct data

#

like I use image resolution rn

#

since there are no different colors

#

or do you think thats not doable

#

using resolution as a feature

faint ravine Sep 12, 2021, 10:35 PM

#

Uhm could be, but the weight of the feature isn't significant. A user might like a high resolution image. But its not much of an inconvenience you know..

lapis sequoia Sep 12, 2021, 10:36 PM

#

oh sure, not talking about production data

#

like its just randomly selected rn

faint ravine Sep 12, 2021, 10:36 PM

#

Try color and resolution. Give color more emphasis for now. I gotta sleep now. Tell me about your progress tomorrow..

lapis sequoia Sep 12, 2021, 10:36 PM

#

Sure, will try to get some colors

#

thx!

light warren Sep 12, 2021, 10:48 PM

#

Hey, i dont know if this the right channel, but could i ask a question about portfolio optimization using python

minor geyser Sep 12, 2021, 10:59 PM

#

Hello, has anyone ever heard of DataCamp? If you have what is your thoughts on it?

amber blaze Sep 12, 2021, 11:27 PM

#

I'm trying to find a way to hide code cell input in a regular Jupyter Notebook so it just shows the output (ideally with some kind of toggle to show or hide the input) - is there a way to do that? I find instructions for things like 'Jupyter Book' but I guess that is a different thing.

desert oar Sep 12, 2021, 11:39 PM

#

amber blaze I'm trying to find a way to hide code cell input in a regular Jupyter Notebook s...

there's a notebook extension for this https://jupyter-contrib-nbextensions.readthedocs.io/en/latest/nbextensions/hide_input_all/readme.html

amber blaze Sep 12, 2021, 11:39 PM

#

desert oar there's a notebook extension for this https://jupyter-contrib-nbextensions.readt...

I just figured out how to install nbextensions, although I was hoping to be able to hide input on a cell-by-cell basis.

#

Looks like "Hide input" extension might be what I want

desert oar Sep 12, 2021, 11:40 PM

#

there's another extension in that same package that does it, i think

#

look through the list

#

note that if you're running a kernel in a separate environment from jupyter, the extension must be installed in the jupyter environment

amber blaze Sep 12, 2021, 11:42 PM

#

Ok thanks, right now I am just using vanilla Jupyter Notebooks - I'm just starting out with them.

dense lintel Sep 13, 2021, 4:04 AM

#

i have an mp3 file
and i want to check if some part of that mp3 file is present in that mp4 file

bold timber Sep 13, 2021, 4:41 AM

#

hi i so confuse to handling outliers. how to decide a value is outliers or not?

#

whether have parameter to decide a value is outlier or not?

desert oar Sep 13, 2021, 4:53 AM

#

bold timber hi i so confuse to handling outliers. how to decide a value is outliers or not?

an "outlier" is a data point that is not drawn from the same data-generating process at the rest of your data

smoky lava Sep 13, 2021, 4:54 AM

#

Ok so I've written small this package that recursively crawls links starting with an initial URL, then crawling the URLs on that page and so on... example: py crawler = Crawler( head_url='https://docs.python.org', branching=1, max_depth=2, url_filter=URLFilter('python.org', True) ) crawler.crawl() produces the following json data ```json
{
"3.9.7 Documentation": {
"url": "https://docs.python.org",
"links": {
"3.11.0a0 Documentation": {
"url": "https://docs.python.org/3.11/",
"links": {
"3.9.7 Documentation": {
"url": "https://docs.python.org/3.9/",
"links": {}
...

#

I don't have any specific idea, I just thought there might be some mining utility in this

bold timber Sep 13, 2021, 4:58 AM

#

desert oar an "outlier" is a data point that is not drawn from the same data-generating pro...

do we can handle outlier on target variable?

west dagger Sep 13, 2021, 5:14 AM

#

is there any way to render a dynamic web page besides requests_html?

chilly geyser Sep 13, 2021, 5:14 AM

#

desert oar an "outlier" is a data point that is not drawn from the same data-generating pro...

I wouldn't use this definition

#

Outliers can be extreme events and they can stem from the same data-generating process

smoky lava Sep 13, 2021, 5:19 AM

#

west dagger is there any way to render a dynamic web page besides `requests_html`?

I tried requests_html, didn't work for me, played around with selenium a bit, Selenium will render JS

west dagger Sep 13, 2021, 5:21 AM

#

smoky lava I tried requests_html, didn't work for me, played around with selenium a bit, Se...

yeah i cant get it too work ethier

#

so im stuck with selenuim then

smoky lava Sep 13, 2021, 5:29 AM

#

Look like your options are use PyQT's web browser utilities to read the javascript or use Selenium. It does look like you can use Selenium to load specific snippets of javascript instead of loading the whole page. That could save a lot of time

#

@west dagger

west dagger Sep 13, 2021, 5:31 AM

#

smoky lava Look like your options are use PyQT's web browser utilities to read the javascri...

thanks you my man. ill get into the selenuim docs

smoky lava Sep 13, 2021, 5:35 AM

#

no problem!

lapis sequoia Sep 13, 2021, 6:40 AM

#

guys, any idea what would be the metric to use to check the performance of a neural topic model? and why should we use the mentioned method?

serene scaffold Sep 13, 2021, 6:48 AM

#

lapis sequoia guys, any idea what would be the metric to use to check the performance of a neu...

I can check on this in a few hours. One of my coworkers was working on that but I can only get to it on my work computer

#

!remind 5h topic modeling

arctic wedgeBOT Sep 13, 2021, 6:48 AM

#

You got it!

Your reminder will arrive on <t:1631533733:F>!

lapis sequoia Sep 13, 2021, 6:49 AM

#

sure, that would be really nice! thanksss man!

#

https://tenor.com/view/awesome-minions-excited-gah-i-cant-even-gif-14537236

Tenor

iron basalt Sep 13, 2021, 7:31 AM

#

frosty flame ERROR: Could not install packages due to an OSError: [WinError 5] Access is deni...

Run the console as administrator.

#

(right click on cmd prompt -> run as administrator (popup will ask for permission))

#

(programs run without admin cannot touch stuff in certain folders like program files, system32, etc)

#

(though i'm not sure why it would need permissions, maybe some conda thing)

royal crest Sep 13, 2021, 7:38 AM

#

iron basalt (though i'm not sure why it would need permissions, maybe some conda thing)

conda thing when doing pip install tensorflow?

#

shouldn't need root access for pip

#

or uhh what's the windows equivalent

#

Admin

iron basalt Sep 13, 2021, 7:39 AM

#

royal crest conda thing when doing `pip install tensorflow`?

It's admin, not root, but yeah idk, it seems to involve conda somehow from the file path.

#

Windows has further separation between the real root and admins.

royal crest Sep 13, 2021, 7:40 AM

#

oh turns out they had their problems fixed

iron basalt Sep 13, 2021, 7:40 AM

#

Not meant to use the real root unless you need to do a recovery boot.

royal crest Sep 13, 2021, 7:40 AM

#

4 messages below with a thank you

iron basalt Sep 13, 2021, 7:40 AM

#

(Or uninstall something that won't let itself be uninstalled, that bug still there)

royal crest Sep 13, 2021, 7:41 AM

#

good to know

slate fox Sep 13, 2021, 8:53 AM

#

Hello

#

i am trying to implement a kaggle code in my google collab to understand it but i am getting a error

#

#

can anyone help me

#

i am not able to find this on the net

tidal bough Sep 13, 2021, 8:59 AM

#

slate fox

The normal way to get a column of a dataframe is df["column name"]. If the column name is a valid identifier, like mycoolcolumn, then you can also do it as df.mycoolcolumn. Check your column names - presumably, text_sent is not the column name.

slate fox Sep 13, 2021, 9:07 AM

#

okay got it thanks

rigid zodiac Sep 13, 2021, 11:10 AM

#

Morning everyone, quick question, have you ever use LSTM or CNN, ConvLSTM model? If yes, do you shuffle your data and randomly split into train-test set?

slate fox Sep 13, 2021, 11:31 AM

#

is anyone good with docker kubernetes who can help me understand something or do you know a server where I can get that help

royal crest Sep 13, 2021, 11:32 AM

#

not sure if containerisation is data science or AI related

rigid zodiac Sep 13, 2021, 11:41 AM

#

royal crest not sure if containerisation is data science or AI related

can you share it to me in the dm

royal crest Sep 13, 2021, 11:46 AM

#

what do you want me to share exactly

#

https://www.docker.com/

Docker

Empowering App Development for Developers | Docker

Learn how Docker helps developers bring their ideas to life by conquering the complexity of app development.

rigid zodiac Sep 13, 2021, 11:46 AM

#

royal crest what do you want me to share exactly

the server name, perhaps the link too

buoyant adder Sep 13, 2021, 11:48 AM

#

Why you should not directly remove outliers? To understand check out this video:
https://youtu.be/BAQvZntcOpo

YouTube

Analytica

Outliers Data Science | Why you should not remove them!

This will give you an intuition why you should not remove outliers and how they can be used to facilitate your analysis and expand your vision.
Join this telegram group if you are serious about learning data science and want to avail free organized resources that are added and updated everyday: https://t.me/analyticadata
Follow us on Instagram:...

▶ Play video

arctic wedgeBOT Sep 13, 2021, 11:48 AM

#

serene scaffold !remind 5h topic modeling

It has arrived!

Here's your reminder: topic modeling
[Jump back to when you created the reminder](#data-science-and-ml message)

rigid zodiac Sep 13, 2021, 11:51 AM

#

Any body know how to stack frames?

velvet thorn Sep 13, 2021, 12:07 PM

#

rigid zodiac Any body know how to stack frames?

meaning?

rigid zodiac Sep 13, 2021, 12:08 PM

#

I'm working on a mmwave radar and wondering how can I stack all the frame together

drifting mason Sep 13, 2021, 12:12 PM

#

Hey guys, I wanted an Idea

#

I have a list of molecule IDs from a database

#

I have their corresponding names

#

All I wanna do is search the IDs in the database (online database called ChEBI) and check if I have the proper names

#

example

#

Here say suppose, there is an online fruit database

#

The ID of Apple is 11111

#

I wanna make a python program to search the online fruit database 22222 and check if it is really mango

#

How can I do that

#

Please help me

royal crest Sep 13, 2021, 12:16 PM

#

i believe chembl has API you can use

#

https://www.ebi.ac.uk/chembl/api/data/docs

#

see here

drifting mason Sep 13, 2021, 12:17 PM

#

Sorry it is ChEBI

royal crest Sep 13, 2021, 12:18 PM

#

check out libChEBI

#

https://jcheminf.biomedcentral.com/articles/10.1186/s13321-016-0123-9

Journal of Cheminformatics

libChEBI: an API for accessing the ChEBI database - Journal of Chem...

Background ChEBI is a database and ontology of chemical entities of biological interest. It is widely used as a source of identifiers to facilitate unambiguous reference to chemical entities within biological models, databases, ontologies and literature. ChEBI contains a wealth of chemical data, covering over 46,500 distinct chemical entities, a...

#

this article explains it in detail

#

https://github.com/libChEBI/libChEBIpy

GitHub

GitHub - libChEBI/libChEBIpy: libChEBIpy: a Python API for accessin...

libChEBIpy: a Python API for accessing the ChEBI database - GitHub - libChEBI/libChEBIpy: libChEBIpy: a Python API for accessing the ChEBI database

#

this one for the github

drifting mason Sep 13, 2021, 12:19 PM

#

royal crest this article explains it in detail

Oh cool! Thanks!

hardy cloud Sep 13, 2021, 12:31 PM

#

hey everyone!, i'have been learning python for a year and it's my first language, i wanna enter the data science field but i'm afraid i don't have the required skills,so my question is, what should i be capable of doing before learning data science?

rigid zodiac Sep 13, 2021, 12:31 PM

#

ability to dont give up.... tbh

#

also you basically have to read a lot of stackoverflow and github + able to understand the error

gaunt marsh Sep 13, 2021, 12:37 PM

#

I am using Mathplotlib and I have a bar chart. Is it possible to plot directly to an image file with high resolution?

#

not a 640x480 png file but maybe a 4K picture with good zoomability

royal crest Sep 13, 2021, 12:41 PM

#

!d matplotlib.pyplot.figure

arctic wedgeBOT Sep 13, 2021, 12:41 PM

#

matplotlib.pyplot.figure


matplotlib.pyplot.figure(num=None, figsize=None, dpi=None, facecolor=None, edgecolor=None, frameon=True, FigureClass=<class 'matplotlib.figure.Figure'>, clear=False, **kwargs)```
Create a new figure.

royal crest Sep 13, 2021, 12:42 PM

#

experiment around the params figsize and dpi

gaunt marsh Sep 13, 2021, 1:26 PM

#

royal crest experiment around the params `figsize` and `dpi`

thanks. That worked.

Is there any possibility to filter duplicates from a numpy array?

tender hearth Sep 13, 2021, 1:32 PM

#

gaunt marsh thanks. That worked. Is there any possibility to filter duplicates from a numpy...

np.unique

velvet thorn Sep 13, 2021, 1:35 PM

#

tender hearth `np.unique`

I always wonder why np.unique's return value is so clunky

#

wait

#

it's not

#

must have been thinking of something else

#

or did it get changed

gaunt marsh Sep 13, 2021, 1:42 PM

#

Bildschirmfoto_2021-09-13_um_15.42.08.png

#

its like that @velvet thorn

velvet thorn Sep 13, 2021, 1:42 PM

#

gaunt marsh its like that <@!171929073063297024>

what?

gaunt marsh Sep 13, 2021, 1:42 PM

#

this is my code with np.unique

velvet thorn Sep 13, 2021, 1:43 PM

#

gaunt marsh this is my code with np.unique

okay...?

gaunt marsh Sep 13, 2021, 1:43 PM

#

velvet thorn okay...?

and this with np.array

Bildschirmfoto_2021-09-13_um_15.43.29.png

velvet thorn Sep 13, 2021, 1:44 PM

#

gaunt marsh and this with np.array

...but why are you telling me

lapis sequoia Sep 13, 2021, 1:44 PM

#

quick question:

Can inner product be something other than dot product for say R^2?

reasoning to ask question:
An inner product is a generalization of the dot product.
altho it does mention multiplying, I'd want to know if we can define our way of it?
from: https://mathworld.wolfram.com/InnerProduct.html

also, assuming that we can have some different inner product, can we create Hilbert space by that?
if yes, is there any example of that?

gaunt marsh Sep 13, 2021, 1:44 PM

#

velvet thorn I always wonder why `np.unique`'s return value is so clunky

because of this comment. nvm

velvet thorn Sep 13, 2021, 1:45 PM

#

gaunt marsh because of this comment. nvm

that wasn't really what I meant though

#

but also

#

I was thinking of something else

gaunt marsh Sep 13, 2021, 2:10 PM

#

I have an Array with multiple sub-arrays. I want to compare the subarrays to each other and filter out the duplicate subarrays. The subarrays contain integer values. What is the best way to do that?

lapis sequoia Sep 13, 2021, 2:11 PM

#

sorting out can take O(nlogn) after that removing dupes(if you sort that way) can be done in O(n)
(altho not sure)
also you may ask this question in #algos-and-data-structs

dusty anchor Sep 13, 2021, 2:17 PM

#

hey guys anyone know whats the difference between UNET and UNET Xception that is used here https://keras.io/examples/vision/oxford_pets_image_segmentation/ ? for me it seems like it skip the concatenation part, but im not sure what is the residual for then...

Keras documentation: Image segmentation with a U-Net-like architecture

faint ravine Sep 13, 2021, 2:20 PM

#

Can someone help with offline speech recognition

light warren Sep 13, 2021, 2:35 PM

#

hey, could someone help me with this error

#

https://gyazo.com/dbf6d2c0dc520d4eac541c8eee33eeeb

Gyazo

lone drum Sep 13, 2021, 2:37 PM

#

when i replace my -9999 values by NaN values it is not getting replaced , i am using pandas
my code python df = pd.read_csv(f'{path}{file_name}{extension}') print(df) print()

replace -9999 values by NaN values```python

new_df = df.replace(-9999, np.NaN)
print(new_df) python
day temperature windspeed event
0 1/1/2017 32 6 Rain
1 1/2/2017 -99999 7 Sunny
2 1/3/2017 28 -99999 Snow
3 1/4/2017 -99999 7 0
4 1/5/2017 32 -99999 Rain
5 1/6/2017 31 2 Sunny
6 1/6/2017 34 5 0

    day  temperature  windspeed  event

0 1/1/2017 32 6 Rain
1 1/2/2017 -99999 7 Sunny
2 1/3/2017 28 -99999 Snow
3 1/4/2017 -99999 7 0
4 1/5/2017 32 -99999 Rain
5 1/6/2017 31 2 Sunny
6 1/6/2017 34 5 0

gaunt marsh Sep 13, 2021, 2:39 PM

#

lapis sequoia sorting out can take O(nlogn) after that removing dupes(if you sort that way) ca...

removing is what I meant and has a better performance with O(n).

hardy cloud Sep 13, 2021, 2:51 PM

#

gaunt marsh I have an Array with multiple sub-arrays. I want to compare the subarrays to eac...

newarr = [[element for element in arr[i] if element not in arr[:i] +arr[i+1:] ]for i in range(len(arr)-1)]
# this should work!.

#

for a lot of data, this code is slow O(n*2), stackoverflow a faster one!

lapis sequoia Sep 13, 2021, 3:00 PM

#

serene scaffold I can check on this in a few hours. One of my coworkers was working on that but ...

reminding you 😉

serene scaffold Sep 13, 2021, 3:00 PM

#

lapis sequoia reminding you 😉

huh yes let me see

#

still looking

#

sigh, the explanation is really long. let me try to figure it out.

gaunt marsh Sep 13, 2021, 3:11 PM

#

hardy cloud ```py newarr = [[element for element in arr[i] if element not in arr[:i] +arr[i+...

Okay, thanks. What about looking for duplicates and removing them if there are < 3?

#

For example: I want only the subarrays which are at least 3 times in the super array

#

so every unique subarray and just normal duplicates (two times) will be removed

lapis sequoia Sep 13, 2021, 3:13 PM

#

making a dict?

gaunt marsh Sep 13, 2021, 3:14 PM

#

or let's say: I want to remove every subarray, which has no duplicates. I tried it with unique but it doesn't seem to work in my chart

desert oar Sep 13, 2021, 3:14 PM

#

chilly geyser Outliers can be extreme events and they can stem from the same data-generating p...

that's fair, but i don't know if they should be handled the same way or universally "removed" the way people like to remove outliers

serene scaffold Sep 13, 2021, 3:14 PM

#

lapis sequoia reminding you 😉

I haven't found the paper associated with it yet. unfortunately I have to get back to working on it, but I might figure it out since that's what I'm working on today.

gaunt marsh Sep 13, 2021, 3:15 PM

#

if you look at my chart, there are a lot of bars which only appear once. every bar represents a subarray. I want to remove the unique ones

Bildschirmfoto_2021-09-13_um_17.15.00.png

desert oar Sep 13, 2021, 3:15 PM

#

gaunt marsh I have an Array with multiple sub-arrays. I want to compare the subarrays to eac...

is this a 2-dimensional numpy array?

np.array([
    [1, 2, 3],
    [4, 5, 6],
    [1, 2, 3],
])

or a 1-dimensional array of unknown-size lists/arrays?

np.array([
    [1, 2, 3],
    [4, 5],
    [6, 7, 8, 9],
    [4, 5],
], dtype='object')

#

oh, this thing

#

i told you, use a Counter and remove all the elements with count 1

#

that should hopefully remove the vast majority of your data

gaunt marsh Sep 13, 2021, 3:16 PM

#

okay the collections counter which has to be imported, right?

desert oar Sep 13, 2021, 3:16 PM

#

!e ```python
from collections import Counter
data = list('aabcabdafbba')
item_counter = Counter(data)
item_counter_dup = {k: v for k, v in item_counter.items() if v > 1}
print(item_counter)
print(item_counter_dup)

arctic wedgeBOT Sep 13, 2021, 3:17 PM

#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

001 | Counter({'a': 5, 'b': 4, 'c': 1, 'd': 1, 'f': 1})
002 | {'a': 5, 'b': 4}

lapis sequoia Sep 13, 2021, 3:24 PM

#

hello>

#

my python AI is not taking my voice input

#

what shall i do

#

?

gaunt marsh Sep 13, 2021, 3:43 PM

#

desert oar i told you, use a `Counter` and remove all the elements with count `1`

I tried this:

print ([a for a in ara if ara.count(a) == 1])

but it says 'numpy.ndarray' object has no attribute 'count'

#

What's the matter with numpy-array? Do I have to use a DataFrame for that?

desert oar Sep 13, 2021, 4:06 PM

#

gaunt marsh What's the matter with numpy-array? Do I have to use a DataFrame for that?

this will do a linear scan over ara for every value of a

#

and yes, there's no numpy ndarray count method

#

!e

from collections import Counter
data = list('aabcabdafbba')
item_counter = Counter(data)

print('All counts:', dict(item_counter))
print('Unique counts:', {k: v for k, v in item_counter.items() if v == 1})
print('Duplicated counts:', {k: v for k, v in item_counter.items() if v > 1})

arctic wedgeBOT Sep 13, 2021, 4:07 PM

#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

001 | All counts: {'a': 5, 'b': 4, 'c': 1, 'd': 1, 'f': 1}
002 | Unique counts: {'c': 1, 'd': 1, 'f': 1}
003 | Duplicated counts: {'a': 5, 'b': 4}

desert oar Sep 13, 2021, 4:08 PM

#

i also told you to use tuples for this

#

numpy isn't helpful for this problem

#

you want a list of RGB tuples

#

[(r1, g1, b1), (r2, g2, b2), ...]

gaunt marsh Sep 13, 2021, 4:16 PM

#

Okay thanks. I have it as tuples but commented it. Played around with np.array and np.unique

pine wolf Sep 13, 2021, 4:21 PM

#

well, no count method, but you can do :

In [47]: a = np.array(list('aabcabdafbba'))

In [48]: (a == 'a').sum()
Out[48]: 5

desert oar Sep 13, 2021, 4:24 PM

#

oh you probably can do this with numpy unique

#

!e ```python
import numpy as np

rgbs = np.array([
[1, 2, 3],
[4, 5, 6],
[1, 2, 3],
[4, 5, 7],
])

rgb_uniq, rgb_uniq_idx, rgb_counts = np.unique(rgbs, return_index=True, return_counts=True, axis=1)

print(rgb_uniq)
print(rgb_uniq_idx)
print(rgb_counts)

arctic wedgeBOT Sep 13, 2021, 4:26 PM

#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

001 | [[1 2 3]
002 |  [4 5 6]
003 |  [1 2 3]
004 |  [4 5 7]]
005 | [0 1 2]
006 | [1 1 1]

desert oar Sep 13, 2021, 4:27 PM

#

hm, i wonder why that first one is not deduplicated

#

i'll have to experiment with this

#

!e ```python
import numpy as np

rgbs = np.array([
[1, 2, 3],
[4, 5, 6],
[1, 2, 3],
[4, 5, 7],
])

rgb_uniq, rgb_uniq_idx, rgb_counts = np.unique(
rgbs, return_index=True, return_counts=True, axis=0
)

print(rgb_uniq)
print(rgb_uniq_idx)
print(rgb_counts)

arctic wedgeBOT Sep 13, 2021, 4:28 PM

#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

001 | [[1 2 3]
002 |  [4 5 6]
003 |  [4 5 7]]
004 | [0 1 3]
005 | [2 1 1]

desert oar Sep 13, 2021, 4:28 PM

#

there we go @gaunt marsh ☝️

#

that would work if you already have a numpy array of rgb values, e.g. 700k rows and 3 columns (or 4 if you have alpha)

#

!e ```python
import numpy as np

Every row is an RGBA quad

rgbas = np.array([
[0.50, 0.75, 0.25, 1.00],
[0.25, 0.50, 0.75, 1.00],
[0.50, 0.75, 0.25, 0.50],
[0.25, 0.50, 0.75, 1.00],
]) * 255
print(rgbas)

Every row is an RGB triple (drop the A channel)

rgbs = rgbas[:-1]
print(rgbs)

rgb_uniq, rgb_uniq_idx, rgb_counts = np.unique(
rgbs, return_index=True, return_counts=True, axis=0
)
print(rgb_uniq)
print(rgb_uniq_idx)
print(rgb_counts)

arctic wedgeBOT Sep 13, 2021, 4:31 PM

#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

001 | [[127.5  191.25  63.75 255.  ]
002 |  [ 63.75 127.5  191.25 255.  ]
003 |  [127.5  191.25  63.75 127.5 ]
004 |  [ 63.75 127.5  191.25 255.  ]]
005 | [[127.5  191.25  63.75 255.  ]
006 |  [ 63.75 127.5  191.25 255.  ]
007 |  [127.5  191.25  63.75 127.5 ]]
008 | [[ 63.75 127.5  191.25 255.  ]
009 |  [127.5  191.25  63.75 127.5 ]
010 |  [127.5  191.25  63.75 255.  ]]
011 | [1 2 0]
... (truncated - too many lines)

Full output: https://paste.pythondiscord.com/ayepatiyey.txt?noredirect

desert oar Sep 13, 2021, 4:31 PM

#

so you would use rgb_uniq_idx as the bar position in the plot, and rgb_counts as the bar size

hard gull Sep 13, 2021, 5:35 PM

#

Is there an good way of learning ai in python?

grave frost Sep 13, 2021, 6:24 PM

#

hard gull Is there an good way of learning ai in python?

check the pinned tab

serene dragon Sep 13, 2021, 6:41 PM

#

Is there any easy way to color scatter plot in Pandas according to column values?

serene scaffold Sep 13, 2021, 6:46 PM

#

serene dragon Is there any easy way to color scatter plot in Pandas according to column values...

you just need to add a column to the dataframe that tells you what color the dot for each row should be, and then use DataFrame.plot.scatter

#

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.scatter.html?highlight=plot scatter#pandas.DataFrame.plot.scatter

gleaming osprey Sep 13, 2021, 6:51 PM

#

plz roast immediatly. ```python
classifications = []

for i in data['Rating']:
if i < 3:
classifications.append('Average')

elif i >= 3 and i < 7:
    classifications.append('Great')

elif i >= 7:
    classifications.append('Excellent')

data['Classifications'] = classifications```

#

there is data['Rating'] between 1 - 10 and I have to classify. Question: Classify Movies Based on Ratings [Excellent, Good, and Average]

serene scaffold Sep 13, 2021, 6:55 PM

#

gleaming osprey plz roast immediatly. ```python classifications = [] for i in data['Rating']: ...

rating = data['Rating']
data['Classifications'] = 'Average'
data.loc[rating >= 3 & rating < 7, 'Classifications'] = 'Great'
data.loc[rating > 7, 'Classifications'] = 'Excellent'

serene dragon Sep 13, 2021, 7:26 PM

#

#

When i have used custom names for legend

#

it got disconnected from hue

#

df = pd.read_csv('https://raw.githubusercontent.com/dspiegel29/ArtofStatistics/master/00-1-age-and-year-of-deathofharold-shipmans-victims/00-1-shipman-confirmed-victims-x.csv', sep=",", decimal='.')
plt.figure(figsize=[12,10])
plt.rc('font', size=15)
x = sns.scatterplot(x='fractionalDeathYear', y='Age', data= df, hue='gender2',)
plt.ylabel('Wiek ofiary')
plt.xlabel('rok')
plt.legend(['Kobieta','Mężczyzna'], loc='upper left'

bold timber Sep 13, 2021, 8:39 PM

#

do I can to replace outlier by median?

serene scaffold Sep 13, 2021, 8:46 PM

#

bold timber do I can to replace outlier by median?

how do you decide what an outlier is?

bold timber Sep 13, 2021, 8:47 PM

#

serene scaffold how do you decide what an outlier is?

because the value is so far than others

serene scaffold Sep 13, 2021, 8:47 PM

#

bold timber because the value is so far than others

how far?

bold timber Sep 13, 2021, 8:48 PM

#

6500000 and others around 100-100000

serene scaffold Sep 13, 2021, 8:48 PM

#

bold timber 6500000 and others around 100-100000

so, any value that's over 100000?

bold timber Sep 13, 2021, 8:49 PM

#

yes, like 150000

serene scaffold Sep 13, 2021, 8:50 PM

#

df[df > 100_000] = df.median(axis=1)

#

this will do it with the median of every column, I believe.

bold timber Sep 13, 2021, 8:52 PM

#

serene scaffold this will do it with the median of every column, I believe.

how if the value is doesn't make sense into one column like a 0. can i do this same?

serene scaffold Sep 13, 2021, 8:52 PM

#

bold timber how if the value is doesn't make sense into one column like a 0. can i do this s...

df[df > 100_000 | df == 0] = df.median(axis=1)

bold timber Sep 13, 2021, 8:53 PM

#

serene scaffold ```py df[df > 100_000 | df == 0] = df.median(axis=1) ```

ok thank u for the answer

gleaming osprey Sep 13, 2021, 8:54 PM

#

serene scaffold ```py rating = data['Rating'] data['Classifications'] = 'Average' data.loc[ratin...

ok, thanks for teaching me .loc, it helps a ton!

serene scaffold Sep 13, 2021, 8:59 PM

#

bold timber ok thank u for the answer

let me know if it doesn't work. Pandas is very fickle and it's difficult to predict what operations will work as expected.

crisp wing Sep 13, 2021, 9:02 PM

#

Anyone knows how to get statsmodels.api.OLS.predict working with train/test-split data, like with sklearn's predict?
Using pandas.dataframes,

from sklearn.model_selection import train_test_split
import statsmodels.api as sm
        df_x_train, df_x_test, df_y_train, df_y_test = train_test_split(df.loc[:, df.columns != 'Y'],
                                                                        df.loc[:, 'Y'],
                                                                        train_size=0.75, random_state=42)

        df_x_train = sm.add_constant(df_x_train)
        df_x_test = sm.add_constant(df_x_test)
        model_lm = sm.OLS(df_y_train, df_x_train)
        result = model_lm.fit()
        df_pred = model_lm.predict(df_x_test) # ValueError here

Gives me a

ValueError: shapes (11976178,3) and (11976178,3) not aligned: 3 (dim 1) != 11976178 (dim 0)

Which to me seems like the predict function assumes df_x_test to be same size as df_x_train?

bold timber Sep 13, 2021, 9:03 PM

#

serene scaffold let me know if it doesn't work. Pandas is very fickle and it's difficult to pred...

I have a car price dataset. I have a column is Mileage_kmpl that has 0.0. Do i can to replace it by median? I want to do this because 0.0 is doesn't make sense I guess

serene scaffold Sep 13, 2021, 9:08 PM

#

bold timber I have a car price dataset. I have a column is Mileage_kmpl that has 0.0. Do i c...

milage = df['Mileage_kmpl']
df.loc[milage > 100_000 | milage == 0, 'Mileage_kmpl'] = milage.median()

bold timber Sep 13, 2021, 9:09 PM

#

serene scaffold ```py milage = df['Mileage_kmpl'] df.loc[milage > 100_000 | milage == 0, 'Mileag...

So outlier can handled by imputing median?

serene scaffold Sep 13, 2021, 9:09 PM

#

bold timber So outlier can handled by imputing median?

yes

#

you might use a more sophisticated way of determining what an outlier is, like standard deviation, or something

bold timber Sep 13, 2021, 9:10 PM

#

serene scaffold you might use a more sophisticated way of determining what an outlier is, like s...

How to do this? Can u elaborate?

serene scaffold Sep 13, 2021, 9:15 PM

#

bold timber How to do this? Can u elaborate?

milage = df['Mileage_kmpl']
is_outlier = (milage - milage.mean()).abs() >= 3 * milage.std()
df.loc[is_outlier, 'Mileage_kmpl'] = milage.median()

bold timber Sep 13, 2021, 9:16 PM

#

serene scaffold ```py milage = df['Mileage_kmpl'] is_outlier = (milage - milage.mean()).abs() >=...

what is number of 3?

serene scaffold Sep 13, 2021, 9:16 PM

#

bold timber what is number of 3?

for three standard deviations

bold timber Sep 13, 2021, 9:16 PM

#

serene scaffold for three standard deviations

ok thank you

digital sparrow Sep 13, 2021, 10:13 PM

#

Anyone know a pandas command to drop all rows below a certain row number for example row 21 and below need removed.

iron basalt Sep 13, 2021, 10:21 PM

#

digital sparrow Anyone know a pandas command to drop all rows below a certain row number for exa...

>>> df = pd.DataFrame(np.random.rand(6,2), index=range(0,18,3), columns=['A', 'B'])
>>> df
           A         B
0   0.832698  0.573643
3   0.917200  0.880660
6   0.577085  0.011027
9   0.964913  0.153146
12  0.012555  0.593801
15  0.792496  0.515415
>>> df[df.index > 6]
           A         B
9   0.964913  0.153146
12  0.012555  0.593801
15  0.792496  0.515415

digital sparrow Sep 13, 2021, 10:23 PM

#

Thanks!

lapis sequoia Sep 13, 2021, 11:06 PM

#

Hii, how to extract all tickers from bloomberg ? Im still looking .

#

Im using xbbg library

hasty grail Sep 13, 2021, 11:24 PM

#

serene scaffold ```py milage = df['Mileage_kmpl'] df.loc[milage > 100_000 | milage == 0, 'Mileag...

I think you need to enclose each boolean mask in brackets because of operator precedence

serene scaffold Sep 13, 2021, 11:50 PM

#

hasty grail I think you need to enclose each boolean mask in brackets because of operator pr...

that makes sense

desert oar Sep 13, 2021, 11:50 PM

#

R operators have different precedences so you don't need them in R

serene scaffold Sep 13, 2021, 11:51 PM

#

idk R

lapis sequoia Sep 14, 2021, 12:27 AM

#

Can anyone recommend some scientific Python projects on GitHub that use a parameters file as input to a computational model?

desert oar Sep 14, 2021, 12:29 AM

#

maybe DVC has some examples

lapis sequoia Sep 14, 2021, 12:30 AM

#

DVC?

desert oar Sep 14, 2021, 12:30 AM

#

https://dvc.org

Data Version Control · DVC

Open-source version control system for Data Science and Machine Learning projects. Git-like experience to organize your data, models, and experiments.

#

i don't know if they have anything, but maybe they have some examples repository somewhere

lapis sequoia Sep 14, 2021, 12:36 AM

#

I found a project called pyrk on GitHub that uses a Python file for input parameters. But I'm curious how others handle parameters especially if those parameters are defined in text file like TOML or YAML. https://github.com/pyrk/pyrk

GitHub

GitHub - pyrk/pyrk: Python for Reactor Kinetics

Python for Reactor Kinetics. Contribute to pyrk/pyrk development by creating an account on GitHub.

timid saffron Sep 14, 2021, 2:08 AM

#

What is this?

vital stone Sep 14, 2021, 5:15 AM

#

same question here

inner pebble Sep 14, 2021, 6:01 AM

#

Hello there,
(I am working with pandas)
I have a column for which each lines are series of dates but in string format like : [2021-07-23--2021-07-30, 2021-07-16--2021-07-2021, ...]

I would like first to explode the date range represented by each index and then create one column for the starting date and another column for the ending date.

How would you do that?

Thanks for your help.

#

I have performed it by exploding first and then splitting manually the string in each line , but I m looking for something maybe more accurate

shut dock Sep 14, 2021, 6:17 AM

#

I would start with splitting the string by '--', then splitting those by '-', and saving each as a datetime object. So quick and dirty...

#

start_date = datetime.datetime(date_range_entry)[0].split('--')[0].split('-')[0], date_range_entry)[0].split('--')[0].split('-')[1], date_range_entry)[0].split('--')[0].split('-')[2])

#

I screwed that up a bit with the copy/paste, but do you see what I'm after?

#

start_date = datetime.datetime(date_range_entry.split('--')[0].split('-')[0], date_range_entry.split('--')[0].split('-')[1], date_range_entry.split('--')[0].split('-')[2]) there we go

#

end_date = datetime.datetime(date_range_entry.split('--')[1].split('-')[0], date_range_entry.split('--')[1].split('-')[1], date_range_entry.split('--')[1].split('-')[2]

#

https://www.w3schools.com/python/python_datetime.asp

#

#

You may have to convert the string to int, now that I'm looking at it

inner pebble Sep 14, 2021, 6:24 AM

#

ahhh yes I understand now your idea. That s a good idea.
I will try this thanks @shut dock

shut dock Sep 14, 2021, 6:24 AM

#

start_date = datetime.datetime(int(date_range_entry.split('--')[0].split('-')[0]), int(date_range_entry.split('--')[0].split('-')[1]), int(date_range_entry.split('--')[0].split('-')[2]))

#

Sure thing 👍

tender hearth Sep 14, 2021, 6:34 AM

#

datetime accepts strings

desert oar Sep 14, 2021, 6:35 AM

#

!eval @inner pebble ```python
import pandas as pd

df = pd.DataFrame({
"y": pd.Series([
"2021-07-23--2021-07-30",
"2021-07-16--2021-07-21",
])
})
print(df)

y_ranges = df['y'].str.split('--', expand=True)
y_ranges = y_ranges.apply(pd.to_datetime)
df[["y_start", "y_end"]] = y_ranges
print(df)

arctic wedgeBOT Sep 14, 2021, 6:35 AM

#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

001 |                         y
002 | 0  2021-07-23--2021-07-30
003 | 1  2021-07-16--2021-07-21
004 |                         y    y_start      y_end
005 | 0  2021-07-23--2021-07-30 2021-07-23 2021-07-30
006 | 1  2021-07-16--2021-07-21 2021-07-16 2021-07-21

desert oar Sep 14, 2021, 6:36 AM

#

if you run df.info(), you'll see that those 2 new columns are indeed datetime data and not strings

inner pebble Sep 14, 2021, 6:42 AM

#

expand=True 👍 I forgot this option
and I havn t thought to use an independent variable (y_ranges) to perform the split and then have it back to the initial df
Thanks @desert oar 👍

desert oar Sep 14, 2021, 6:48 AM

#

@inner pebble you can do it without the intermediate variable too 🙂

df[["y_start", "y_end"]] = (
    df['y']
    .str.split('--', expand=True)
    .apply(pd.to_datetime)
)

#

but yes, the key is expand=True

gaunt marsh Sep 14, 2021, 8:43 AM

#

desert oar there we go <@!200805179509964800> ☝️

I already have tuples and it looks like this:

[(200, 200, 215), (161, 162, 172), (72, 45, 31), (116, 75, 33), (182, 182, 195), (103, 63, 26), (151, 152, 156), (211, 211, 228), (190, 191, 204), (98, 75, 49), (93, 51, 23), (135, 135, 135), (117, 107, 84), (163, 99, 35), (172, 173, 184), (172, 173, 184), (89, 84, 75), (163, 148, 120), (167, 152, 124), (173, 162, 142), (237, 235, 217), (108, 101, 90), (122, 115, 101), (135, 128, 115), (199, 190, 170), (144, 129, 103), (155, 141, 114), (213, 206, 187), (188, 177, 155), (151, 144, 128), (70, 68, 62)]```

#

Used this: rgbs = list(map(tuple, list_of_three_values))

dawn orbit Sep 14, 2021, 9:17 AM

#

Hi, I have a problem in a published Jupyter notebook about an algorithm that's called S3D. There is a figure that is meant to visualize your trained model, but for some reason I get this error "Length of values (1) does not match length of index (9)" when running the notebook's code block. What's written in the error log is also quite complicated, so I have no idea where to start debugging. Could someone here give me a hand with this?

worn bough Sep 14, 2021, 11:30 AM

#

@dawn orbit could you give the command that raises the error? And do you know what the 'values' and the 'index' are?

dawn orbit Sep 14, 2021, 1:34 PM

#

worn bough <@!145571789555236864> could you give the command that raises the error? And do ...

The command that issues the error is the following: i_max_arr = pd.np.argwhere(i_series==max_val).flatten()

The values are 5 digit numbers which are to be interpreted as under 1. E.g. 67463 = 0.67463 which represent the Coefficient of determination. As for index that would be the features (Column names) which would appear on the Y- axis. The visual itself is a horizontal bar chart

mortal dove Sep 14, 2021, 1:38 PM

#

I've mainly been using R for time series analysis since that's just what we've been using in uni classes.
Which libraries are generally used for it in python?

dawn orbit Sep 14, 2021, 1:48 PM

#

I'd say Prophet and Darts

#

But you probably already used Prophet if you've used R

prime hearth Sep 14, 2021, 1:55 PM

#

hello, i would like to please ask

#

what is the easiest way to implement sentimental analysis?

#

Ive done little research and mostly it is using Natural language processing such as these alogirhtm:
Neural Network/CNN/RNN
TextBlobs
MultiNomialNB
TfidVectorizer
K-means
SVM

gaunt marsh Sep 14, 2021, 2:28 PM

#

Do you know how to make the bars thicker and the ticks per bar visible? These are about 129.000 values which are processed via mathplotlib barchart

ebon walrus Sep 14, 2021, 2:45 PM

#

gaunt marsh Do you know how to make the bars thicker and the ticks per bar visible? These ar...

you cant expect to show that many values in such a small monitor

#

theres no way to spread it out other than find the mean of the data and display it in 5-10 bars max if possible

#

try to find a way around it

gaunt marsh Sep 14, 2021, 2:46 PM

#

There were around 700.000 values and I removed a lot of them 😫

gaunt marsh Sep 14, 2021, 2:46 PM

#

ebon walrus theres no way to spread it out other than find the mean of the data and display ...

Are you sure about that?

ebon walrus Sep 14, 2021, 2:47 PM

#

i mean unless you want to make it a scrollable dataset you could possibly make it so that you could scroll very far with it but i dont have the skills for that

ebon walrus Sep 14, 2021, 2:47 PM

#

gaunt marsh There were around 700.000 values and I removed a lot of them 😫

id say find hte average for hte data

#

theres no enough colors to display that many values

dawn orbit Sep 14, 2021, 2:50 PM

#

What is the point of the graph? Just looks like colors and their frequency to me

#

What I mean is there has to be a better way to display whatever it is that you want to display in the graph

gaunt marsh Sep 14, 2021, 2:55 PM

#

dawn orbit What is the point of the graph? Just looks like colors and their frequency to me

Y axis: the amount of colors per file
X axis: the amount of colors in all files together

dawn orbit Sep 14, 2021, 3:04 PM

#

gaunt marsh Y axis: the amount of colors per file X axis: the amount of colors in all files...

I find this a weird situation to plot. The way you worded it, to me the X-axis should be a static value since it's just the SUM(...) of all the colors in the files. Besides that I don't see the need to plot a graph for that. Shouldn't a table be enough with the color amounts per file and at the end the grand total of colors?

short heart Sep 14, 2021, 3:34 PM

#

Does LabelEncoder from sklearn automatically fill the nans

uncut barn Sep 14, 2021, 3:53 PM

#

Hi guys,

I'm doing a project that involves a binary segmentation task of an image as well as predicting the grade of an image from (0-3), would I need to create 2 separate CNNs for each task or is there some way I can interlink these 2 tasks, any help would be much appreciated.

rigid zodiac Sep 14, 2021, 3:59 PM

#

uncut barn Hi guys, I'm doing a project that involves a binary segmentation task of an ima...

Can I see your CNN code and use it as a sample code? i'm currently working on some project that similar to you

west dagger Sep 14, 2021, 4:47 PM

#

anyone knows why when i try to start selenium, it cant locate the chrome driver

#

ive added it to /local/bin but joker still wont work

quasi parcel Sep 14, 2021, 7:41 PM

#

hi everyone i hope you are doing well
i need help in constructing a co-occurrence matrix
the data will be like this
for this data
so i am trying to construct customer_id and product_ids
the confusion is
there is a list of product ids now ca i create a concurrence matrix
can anyone help
please

#

this is the sample data

serene scaffold Sep 14, 2021, 7:47 PM

#

quasi parcel hi everyone i hope you are doing well i need help in constructing a co-occurrenc...

df.melt('Product_id').pivot_table(index='Customer_ID', columns='Product_id', aggfunc=np.sum)

See if that works.

#

This would be easier to replicate if you provided the dataframe as text so that we can reproduce it. With screenshots, it's all conjecture.

quasi parcel Sep 14, 2021, 7:48 PM

#

thank you @serene scaffold let me try that

serene scaffold Sep 14, 2021, 7:49 PM

#

If it doesn't work, please do print(df.head().to_csv()) and copy/paste the text into this chat exactly so that we can continue.

quasi parcel Sep 14, 2021, 7:49 PM

#

sure i will do that

#

will 10 row be sufficient?

serene scaffold Sep 14, 2021, 7:50 PM

#

just the code I provided (namely print(df.head().to_csv())) is fine.

#

though you may need to replace df if it has a different name.

#

I assume it worked; otherwise please ping or I won't know to come back.

quasi parcel Sep 14, 2021, 7:53 PM

#

there is this key error

#

@serene scaffold

serene scaffold Sep 14, 2021, 7:53 PM

#

go ahead and do print(df.head().to_csv()) then and provide the text.

quasi parcel Sep 14, 2021, 7:54 PM

#

sure

arctic wedgeBOT Sep 14, 2021, 7:56 PM

#

Hey @quasi parcel!

Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .csv attachments, so here are some tips to help you travel safely:

• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)

• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:

https://paste.pythondiscord.com

serene scaffold Sep 14, 2021, 7:56 PM

#

It should only be five lines of text.

#

but I guess you can use the paste bin. https://paste.pythondiscord.com/

quasi parcel Sep 14, 2021, 7:58 PM

#

https://paste.pythondiscord.com/raw/apofowisoh

serene scaffold Sep 14, 2021, 7:58 PM

#

thanks. one moment

quasi parcel Sep 14, 2021, 7:59 PM

#

sure thanks a lot

serene scaffold Sep 14, 2021, 8:02 PM

#

quasi parcel sure thanks a lot

df.explode('Product_id').pivot_table(index='Customer_ID', columns='Product_id', aggfunc='count', fill_value=0)

quasi parcel Sep 14, 2021, 8:02 PM

#

could you share the output

#

if you dont mind

serene scaffold Sep 14, 2021, 8:02 PM

#

            Time_stamp                                            ...
Product_id      194757 203271 214387 235573 240620 248684 255443  ... 353982 353987 358385 358386 359707 359783 359784
Customer_ID                                                       ...
2145912              0      0      0      0      0      0      0  ...      0      0      0      0      0      0      0
3803083              0      0      0      0      0      0      0  ...      0      1      0      0      0      0      0
4678591              0      0      1      1      1      0      0  ...      0      0      0      0      0      0      0
5247070              0      0      0      0      0      0      0  ...      0      0      0      0      0      0      0
5371841              0      0      0      0      0      0      0  ...      0      0      1      1      0      0      0
6410476              0      0      0      0      0      0      0  ...      0      0      0      0      0      1      1
6427723              0      0      0      0      0      0      0  ...      1      0      0      0      0      0      0
6428352              1      0      0      0      0      1      0  ...      0      0      0      0      1      0      0
6430668              0      0      0      0      0      0      0  ...      0      0      0      0      0      0      0
6435539              0      1      1      0      0      0      1  ...      0      0      0      0      0      0      0

[10 rows x 23 columns]

quasi parcel Sep 14, 2021, 8:06 PM

#

seriously thanks a lot now, can we store this in nparray or something?

serene scaffold Sep 14, 2021, 8:06 PM

#

yes, you can do .to_numpy() at the end.

#

and all your wildest dreams will come true

quasi parcel Sep 14, 2021, 8:09 PM

#

can we also do product to product? @serene scaffold

#

please

#

sorry to ask

serene scaffold Sep 14, 2021, 8:09 PM

#

quasi parcel can we also do product to product? <@!253696366952316929>

idk what that means

quasi parcel Sep 14, 2021, 8:09 PM

#

same co-occurance matrix for product to product

serene scaffold Sep 14, 2021, 8:10 PM

#

let me think

desert oar Sep 14, 2021, 8:11 PM

#

quasi parcel same co-occurance matrix for product to product

use the scikit-learn multilabel binarizer 🙂

serene scaffold Sep 14, 2021, 8:12 PM

#

that might be better lol

serene scaffold Sep 14, 2021, 8:13 PM

#

desert oar use the scikit-learn multilabel binarizer 🙂

df.drop(['Time_stamp', 'Customer_ID'], axis=1).explode('Product_id').reset_index().groupby('index')

this is as far as I've gotten

#

I didn't really have a plan

quasi parcel Sep 14, 2021, 8:21 PM

#

thank you so much @serene scaffold and @desert oar

desert oar Sep 14, 2021, 8:26 PM

#

that pivot table thing was pretty clever

#

i would have probably done it with a loop or something

quasi parcel Sep 14, 2021, 8:31 PM

#

i have a doubt @desert oar i have done with binarizer but

#

its not doing co occurance

#

i mean i want like this

#

but i need to add the number of occurrence of product ids

desert oar Sep 14, 2021, 9:02 PM

#

!e @quasi parcel ```python
from itertools import combinations
import pandas as pd

def unpack_to_columns(series, colnames=None):
return pd.DataFrame(series.tolist(), columns=colnames)

df = pd.DataFrame({
"customer_id": ["c123", "c456", "c789"],
"product_ids": [["a", "c"], ["b", "c", "f"], ["a", "c", "e"]],
}).rename_axis(index="transaction_id")

unique_prod_ids = df['product_ids'].explode().unique()

adjmat_prod_prod = (
unpack_to_columns(
df['product_ids']
.apply(lambda x: [list(pair) for pair in combinations(x, 2)])
.explode(),
colnames=['product_id_1', 'product_id_2']
)
.pivot_table(index='product_id_1', columns='product_id_2', aggfunc=len)
.fillna(0)
.astype(int)
.reindex(index=unique_prod_ids, columns=unique_prod_ids, fill_value=0)
)

print(adjmat_prod_prod)

arctic wedgeBOT Sep 14, 2021, 9:02 PM

#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

001 | product_id_2  a  c  b  f  e
002 | product_id_1               
003 | a             0  2  0  0  1
004 | c             0  0  0  1  1
005 | b             0  1  0  1  0
006 | f             0  0  0  0  0
007 | e             0  0  0  0  0

desert oar Sep 14, 2021, 9:04 PM

#

!e and the customer-product matrix:

import pandas as pd

df = pd.DataFrame({
    "customer_id": ["c123", "c456", "c789"],
    "product_ids": [["a", "c"], ["b", "c", "f"], ["a", "c", "e"]],
}).rename_axis(index="transaction_id")

unique_cust_ids = df['customer_id'].unique()
unique_prod_ids = df['product_ids'].explode().unique()

adjmat_cust_prod = (
    df
    .explode('product_ids')
    .rename(columns={'product_ids': 'product_id'})
    .pivot_table(index='customer_id', columns='product_id', aggfunc=len)
    .fillna(0)
    .astype(int)
    .reindex(index=unique_cust_ids, columns=unique_prod_ids)
)

print(adjmat_cust_prod)

arctic wedgeBOT Sep 14, 2021, 9:04 PM

#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

001 | product_id   a  c  b  f  e
002 | customer_id               
003 | c123         1  1  0  0  0
004 | c456         0  1  1  1  0
005 | c789         1  1  0  0  1

desert oar Sep 14, 2021, 9:05 PM

#

i also cooked up versions using scipy sparse matrices if this data is really big, i can show those if you want

#

https://paste.pythondiscord.com/tiqerecaho.py

#

@serene scaffold you might be interested too

serene scaffold Sep 14, 2021, 9:10 PM

#

desert oar <@!253696366952316929> you might be interested too

lemon_hyperpleased farnsworth

desert oar Sep 14, 2021, 9:25 PM

#

of course, if the data is really huge (streaming from 1 billion gzipped rows or a data warehouse query), you will want to use a dok_matrix instead and fill it with a boring old loop

#

you'll need to incrementally build the product_id -> i dict in that case, too

quasi parcel Sep 14, 2021, 9:41 PM

#

@desert oar thank you so much

charred umbra Sep 14, 2021, 9:45 PM

#

https://www.researchgate.net/publication/354294890_The_Effect_of_Artificial_Amalgamates_on_Identifying_Pathogenesis

#

So this is my paper for an algorithm I made for ientifying diseases using semi-supervised learning

#

you guys have any suggestions for making the algorithm better?

quasi parcel Sep 14, 2021, 10:15 PM

#

@desert oar Hi really thank you, can you suggest me which course you have done for data sciences? please

desert oar Sep 14, 2021, 10:20 PM

#

quasi parcel <@!389497659087650836> Hi really thank you, can you suggest me which course you...

a masters degree, lots of self-study, and several years of work experience with people smarter than me 🙂

#

i didn't take any standalone data science courses

velvet thorn Sep 14, 2021, 10:49 PM

#

desert oar a masters degree, lots of self-study, and several years of work experience with ...

ooh what's your master's in

#

I'm looking to take one next year 🙏

desert oar Sep 14, 2021, 10:52 PM

#

velvet thorn ooh what's your master's in

the degree was a blend of social science and applied stats, but the course schedule was very open, so i mostly took stats classes

#

and some moderately regrettable "applied machine learning" classes

velvet thorn Sep 14, 2021, 10:52 PM

#

desert oar and some moderately regrettable "applied machine learning" classes

why regrettable

desert oar Sep 14, 2021, 10:52 PM

#

not very rigorous, fluffy assignments and exams

#

although one of them was my first introduction to unix (literally a mainframe operated by the university) so that was a very positive experience

#

i also didn't have a thesis adviser until my thesis was already mostly done... oops

arctic wedgeBOT Sep 14, 2021, 10:54 PM

#

Hey @quiet vault!

Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .txt attachments, so here are some tips to help you travel safely:

• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)

• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:

https://paste.pythondiscord.com

velvet thorn Sep 14, 2021, 10:54 PM

#

desert oar not very rigorous, fluffy assignments and exams

this is something I've heard

#

were they classes targeted @ students in general (i.e. "breadth" courses)?

desert oar Sep 14, 2021, 11:27 PM

#

velvet thorn were they classes targeted @ students in general (i.e. "breadth" courses)?

I'm not sure, "data science" was kind of a new thing at the time

#

I think it was a combination of departments dialing in the right difficulty levels, and catering to students in less rigorous masters degrees

#

I remember I specifically took the "applied" classes because I wanted lots of project experience, but that was a mistake

#

It turns out that my friends who took the not-"applied" version not only had more rigorous material, but also had much more intensive projects, and professors who were better connected in industry

serene scaffold Sep 14, 2021, 11:40 PM

#

!d pandas.DataFrame.idxmax

arctic wedgeBOT Sep 14, 2021, 11:40 PM

#

pandas.DataFrame.idxmax


DataFrame.idxmax(axis=0, skipna=True)```
Return index of first occurrence of maximum over requested axis.

NA/null values are excluded.

serene scaffold Sep 14, 2021, 11:41 PM

#

@desert oar where is the level=?

#

||WHERE IS IT?!||

desert oar Sep 14, 2021, 11:41 PM

#

Does it return a tuple if it's a multiindex?

serene scaffold Sep 14, 2021, 11:42 PM

#

it returns a Series

desert oar Sep 14, 2021, 11:42 PM

#

Interesting

#

Oh with axis

#

!e ```
import pandas as pd

idx = pd.MultiIndex.from_tuples([
("a","x"), ("b", "y"), ("c", "z")
], names=["l1", "l2"])
y = pd.Series([1,2,3], index=idx, name="y")

print(y.idxmax())

arctic wedgeBOT Sep 14, 2021, 11:47 PM

#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

('c', 'z')

desert oar Sep 14, 2021, 11:47 PM

#

@serene scaffold ☝️

#

This took me like 10 minutes to type on my phone

serene scaffold Sep 14, 2021, 11:48 PM

#

desert oar This took me like 10 minutes to type on my phone

you did all that for me? lemon_sentimental

desert oar Sep 14, 2021, 11:48 PM

#

Lol yes

serene scaffold Sep 14, 2021, 11:48 PM

#

but look at your DMs!

desert oar Sep 14, 2021, 11:48 PM

#

Heh i see

#

You get an array of tuples then right?

serene scaffold Sep 14, 2021, 11:49 PM

#

Series (of tuples)

desert oar Sep 14, 2021, 11:49 PM

#

Seems like that can work with .loc

serene scaffold Sep 14, 2021, 11:49 PM

#

it's not enough information though

#

there needs to be three selections per row

#

one for precision, recall, and f1

#

I don't need to know that the CRF precision was usually best because what that really means is that it was hyper conservative in making predictions lemon_angrysad

desert oar Sep 14, 2021, 11:50 PM

#

Do 3 separate idxmax'es, one per column

serene scaffold Sep 14, 2021, 11:50 PM

#

isn't that basically what my groupby was?

desert oar Sep 14, 2021, 11:51 PM

#

Yeah but any time i see is_max = x == x_max i squirm

serene scaffold Sep 14, 2021, 11:51 PM

#

also
what about ties?

#

there were a few

desert oar Sep 14, 2021, 11:51 PM

#

Aha

#

Idxmax takes the first

serene scaffold Sep 14, 2021, 11:51 PM

#

in which case they both get to be bold

desert oar Sep 14, 2021, 11:51 PM

#

In that case yes do your thing

#

It would be nice if they supported that natively

#

On a bigger data set that could be the difference between a fast operation and a very slow one

#

Numpy probably supports it

serene scaffold Sep 14, 2021, 11:52 PM

#

pandas not having something that numpy has despite being a layer on top of numpy

desert oar Sep 14, 2021, 11:52 PM

#

You can compute all maxima and their indices in a single pass over the data, it's annoying that the top level functions don't make it easy to do that

desert oar Sep 14, 2021, 11:53 PM

#

serene scaffold > pandas not having something that numpy has despite being a layer on top of num...

Well you'd have to use numpy functions for it, and manually reattach other indexes like column names

#

But yeah pandas can be weird like that

serene scaffold Sep 14, 2021, 11:54 PM

#

https://tenor.com/view/spongebob-no-oh-hell-nope-dont-like-that-gif-15479936

Tenor

tender hearth Sep 15, 2021, 12:39 AM

#

Hey folks, I'm a bit confused on how Transformers work:
If I understand right, at the beginning of the decoding process, a special '<start>' token is fed to the decoder. That input goes through the first MHA layer, and then to the second MHA layer, this time along with the encoder output. Then the decoder loops this process until the special '<end>' token is produced.

If this is the case, why is masking the decoder input during training even required? Why not just feed the '<start>' token, wait for the decoder to finish, and then compute loss based on the decoder output and the ground truth?

serene scaffold Sep 15, 2021, 12:46 AM

#

tender hearth Hey folks, I'm a bit confused on how Transformers work: If I understand right, a...

I believe it's so that tokens that are often found at the beginning of a sequence have something to attend to.

tender hearth Sep 15, 2021, 12:47 AM

#

Can you elaborate? I don't understand

serene scaffold Sep 15, 2021, 12:47 AM

#

What do you know about attention?

tender hearth Sep 15, 2021, 12:50 AM

#

Still learning about it, but I think I have basic understanding. The attention mechanism computes an attention vector that determines how much weight a particular part of the input should have in the output

serene scaffold Sep 15, 2021, 12:51 AM

#

tender hearth Still learning about it, but I think I have basic understanding. The attention m...

let's even just think about it in terms of language itself. Suppose you have the sentence, "Bob was running on the [MASK]". What word in that sentence do you think will do the most to determine what could replace the mask?

tender hearth Sep 15, 2021, 12:52 AM

#

track?

serene scaffold Sep 15, 2021, 12:52 AM

#

but what word would determine that "track" is an acceptable replacement?

tender hearth Sep 15, 2021, 12:52 AM

#

running

serene scaffold Sep 15, 2021, 12:52 AM

#

"track" is not in the sentence currently

#

yes 😄

#

so whatever you pick for the mask, we would say that that word attends to "running" more than "was".

tender hearth Sep 15, 2021, 12:53 AM

#

Ok, so mask the output during training, but what happens during inference?

#

When you don't know the length of the output and thus can't compute a masking vector

serene scaffold Sep 15, 2021, 12:55 AM

#

let me think

#

I'm not sure sad_cat

tender hearth Sep 15, 2021, 12:58 AM

#

Haha, it's fine

serene scaffold Sep 15, 2021, 12:59 AM

#

have you read the original paper?

#

https://arxiv.org/abs/1706.03762

arXiv.org

Attention Is All You Need

The dominant sequence transduction models are based on complex recurrent or
convolutional neural networks in an encoder-decoder configuration. The best
performing models also connect the encoder...

tender hearth Sep 15, 2021, 1:00 AM

#

It was a bit too convoluted for me, so I escaped to blog posts, but I agree, I think the paper has the answer

serene scaffold Sep 15, 2021, 1:00 AM

#

I guess I need to read it too

#

sad_cat

#

don't duck that, it's just a sad cat

velvet thorn Sep 15, 2021, 1:26 AM

#

the paper isn’t written very well IMO

#

but maybe it’s because I don’t have a math background

#

😔

#

wait, wrong paper

#

ignore me

tender hearth Sep 15, 2021, 1:41 AM

#

@serene scaffold Okay so it turns out masking is only done during training. And masking is done in the first place because the ground truth is fed into the decoder in parallel (instead of sequentially)

#

I was under the impression that it did the same thing as it does in inference; i.e. loops until regression

arctic wedgeBOT Sep 15, 2021, 2:17 AM

#

:incoming_envelope: :ok_hand: applied mute to @livid kiln until <t:1631672865:f> (9 minutes and 59 seconds) (reason: newlines rule: sent 126 newlines in 10s).

vagrant kite Sep 15, 2021, 2:20 AM

#

!unmute 581318950760218636

arctic wedgeBOT Sep 15, 2021, 2:20 AM

#

:incoming_envelope: :ok_hand: pardoned infraction mute for @livid kiln.

vagrant kite Sep 15, 2021, 2:21 AM

#

!paste

arctic wedgeBOT Sep 15, 2021, 2:21 AM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pydis.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

vagrant kite Sep 15, 2021, 2:21 AM

#

use that ^

livid kiln Sep 15, 2021, 2:33 AM

#

https://paste.pythondiscord.com/nowibudame.apache
I'm trying to find the longest contiguous index which has a step size equal to the indexes difference modal value which includes the true/false border which satisfies the step size

#

Here the step size is 5

df["diff"] = df["val"].diff()
df["diff"].mode()[0]

#

Here the true/false border is 645/650 and their difference does satisfy the step size of 5

#

df[600:700]

livid kiln Sep 15, 2021, 4:12 AM

#

I've completed the task, any suggestion on how to improve this code?

df["diff"] = df["val"].diff()
diffmode = df["diff"].mode()[0]
idx = df[df["diff"] <= diffmode].index
#Remove index which have smaller than diffmode difference
df = df.loc[pd.Index(np.arange(idx.min()-diffmode,idx.max()+diffmode,diffmode)).intersection(df.index)]
df["diff"] = df["val"].diff()
#Select index which have multiples of diffmode difference
nonmodal = np.setdiff1d(df["diff"].dropna().unique(), diffmode)
bands = df.loc[df["diff"].isin(nonmodal)]
#Find the latter (low to high, therefore upperBand) index on the true/false boundary
upperBand = bands.loc[bands["truth"].rolling(2).sum() == 1.0]["val"].values[0]
lowerBand = bands.iloc[bands.index.get_loc(upperBand) - 1]["val"]
#Get the inner block of the bands
upperIdx = df.iloc[df.index.get_loc(upperBand) - 1]["val"]
lowerIdx = df.iloc[df.index.get_loc(lowerBand) + 1]["val"]
df[lowerIdx:upperIdx]

EDIT: comments added

desert oar Sep 15, 2021, 5:35 AM

#

add comments, because this is non-trivial code

#

you will probably forget what this does in 2 weeks 🙂

fluid sparrow Sep 15, 2021, 5:50 AM

#

@desert oarhii can you help me understand how to populate a list in R with numeric values

desert oar Sep 15, 2021, 5:51 AM

#

@fluid sparrow be specific about:

what you are trying to do
what you already tried
what went wrong when you tried

fluid sparrow Sep 15, 2021, 5:51 AM

#

Sure

desert oar Sep 15, 2021, 5:51 AM

#

also be specific about whether you mean a list or a numeric vector

fluid sparrow Sep 15, 2021, 5:52 AM

#

I am trying to create a list, I did myList = [ 1, 2, 3, 4, 5, 6]

#

and it says object error

#

I have no idea what the differnce between a list and a numeric vector

#

I am completely new to R

desert oar Sep 15, 2021, 5:53 AM

#

are you in a course? or just trying to teach yourself?

#

that's python syntax, not R syntax

fluid sparrow Sep 15, 2021, 5:54 AM

#

I am in a course but I just began

#

I have prior experience with Python

#

Instead of helping answering this question directly if its easier you can redirect me to good R resources

desert oar Sep 15, 2021, 5:59 AM

#

this is good if you already know R, but i don't know any beginner programming guides off the top of my head https://r4ds.had.co.nz/

#

i can search

#

anyway, R has a very different approach to representing data compared to python

#

python has basic data types like strings, numbers, booleans, etc.

#

R does too, but everything is an array - in most cases, you can't get a "number" by itself, you can only get an array containing 1 number

#

consequently, you won't use lists as frequently as you do in python, because the basic data types are already very list-like

#

as for syntax, R does not have special syntax for constructing lists or "vectors" (the R name for a 1-D array). the c() function constructs a vector, and the list() function constructs a list

fluid sparrow Sep 15, 2021, 6:02 AM

#

@desert oar thank you so much! I will definitely check out the site you reffered and I hope I can build my knowledge on it! As you may already know I am very confused and rusty so hopefully I will get better!

desert oar Sep 15, 2021, 6:02 AM

#

i honestly wouldn't look there to start

#

it assumes you already know the basics

#

# A vector of numbers
v1 <- c(1, 2, 3)

# A vector of strings, called a "character" vector
v2 <- c("hello", "welcome to R")

# A vector of Booleans, called a "logical" vector
v3 <- c(TRUE, FALSE, TRUE)

# A list of vectors
items <- list(v1, v2, v3)

fluid sparrow Sep 15, 2021, 6:03 AM

#

Thank you so much again I also see you have been helping other people its truly kind and incredible! I hope I can be as knowledgeable someday

#

I see so I am trying to create a vector of numbers

#

does the c represent something?