#rest api pymongo and third party data

52 messages · Page 1 of 1 (latest)

short apex
#

I'm thinking in building a API with django and pymongo (mongodb)
My objetive is to create crypto indicators.

I will periodically update my database information with third data api (should i use celery?) do some data analysis and display treated data in different endpoints.

can you guys give me some advice?

gentle olive
#

How often will you be needing to fetch the data? Last time I did something similar, I just wrote a management command that used the requests library, saved the data as-is in a JSONField (though adding extra processing there would be a non-issue), and read it from my local postgres DB. I added a cron task run it periodically.

#

You can use celery-beat for task scheduling, but IMO, cron is just simpler.

#

Celery itself is good for asynchronous tasks... if you trigger the update manually, and it takes some time to run.

short apex
#

Imagine I use 20 different APIs to get data. I will be fetching financial data in different intervals (5min, 15min,30min,1h)

#

I think it will take a while to run, because its a lot of data.

#

Your solution makes lot of sense

gentle olive
#

I would recommend consider adding lock files to prevent the same processes from running if a previous run hasn't finished yet (if it runs every 5 minutes and may take longer).

short apex
#

I'm very new to django and DRF. When I researched on web, I didnt find usefull resources to consume third party apis. I dont know where my fetching data codes will be in my file structure.

gentle olive
#

Ideally, it should be in some reusable functions. It can exist in a directory that isn't even a django app (i.e. not have apps.py, admin.py, views.py, models.py files).
For example, you could use myproject/utils/finance/api_consumers.py, have a few reusable functions there (fetch_provider_a(), fetch_provider_b(), ..., process_provider_a(), process_provider_b()...) for sending the requests, and processing them.

#

Those should be agnostic about the fact that they are used in a Django project.

#

Then in your management commands, you can import the models (if applicable), import those functions, run the fetch and processing pipelines, and save the data inside your Django models / database.

#

If you only use mongo, not the ORM for this data, skip the models.

short apex
#

Wow, that cleared my mind a lot. I appreaciate this immensly. I will study watch you said with attention. Its possible to come back here in this post for further doubts?

My regards🇧🇷 ❤️

gentle olive
#

Sure thing

#

You can ping me if you want.

#

I try 🙂

short apex
#

you do, believe it

#

you are helping a random guy from brazil trying to make it

gentle olive
#

Thank you, it means a lot, honestly 🤩

short apex
#

If I would do that on Celery, because I think async can be a good ideia (lot of requests periodically), I would use the same strategy with resuable functions and managamnet ocmmands?🧐

#

What do you think on this GPT approach?⬇️

#

Reusable Functions: Continue to use reusable Python functions or modules for data fetching, processing, and any other operations. These functions can be organized separately from your Django app and should be designed to work independently of the Django framework.

Celery Tasks: Define Celery tasks that call these reusable functions. Each Celery task can correspond to a specific data-fetching operation. The Celery tasks should use the functions you've created to fetch and process data asynchronously.

Management Commands: You can create management commands in Django that trigger the Celery tasks. These management commands can be used to start or schedule data-fetching tasks. For example, you might have a management command named fetch_crypto_data that initiates the Celery task for fetching cryptocurrency data.

Scheduling: Configure Celery Beat to schedule the execution of your Celery tasks at the desired intervals. This will ensure that your data-fetching tasks run periodically without manual intervention.

Concurrency and Scaling: Adjust the concurrency settings of Celery to control the number of workers and concurrent tasks. As your project scales, you can fine-tune these settings to handle the increased workload.

Locking Mechanism: Implement a locking mechanism to prevent multiple instances of the same task from running simultaneously, especially if a previous run is still in progress. This helps avoid conflicts and data consistency issues.

Monitoring: Use Celery's monitoring and management tools to keep track of task execution, view logs, and ensure that tasks are running as expected.

gentle olive
#

Celery is really good at running async code based on user-generate events.

#

For example, when you upload a video to youtube, youtube runs a variety of tasks. One service transcodes the video to different resolutions, so you can watch them on slower internet. Another uses AI to generate subtitles. One uses fingerprinting to identify copyrighted material, e.g. popular songs.

#

You don't want to use a blocking request for that (e.g. the user should not wait 45 minutes for the page to load because you run all those processes), and you don't want to be running those processes manually.

#

In your view, you can just queue all those tasks, and let celery run them as it can.

#

Management commands run under cron are already running in their own processes, so adding celery adds extra complexity for no benefit, IMO.

short apex
#

That makes sense, I will try to implement withou Celery using the management commands.

short apex
#

from django.core.management.base import BaseCommand
from utils.derivativos.apiconsumer import get_data_with_rotation
import asyncio

class Command(BaseCommand):
help = "display hello"

def add_arguments(self, parser):
    parser.add_argument(
        "symbols", type=str, help="lista de símbolos ou símbolo único"
    )
    parser.add_argument(
        "interval", type=str, help="intervalo 1hour,2hour,15min e etc.."
    )
    parser.add_argument("fr", type=int, help="from timestamp")
    parser.add_argument("to", type=int, help="to timestamp")

async def async_handle(self, *args, **kwargs):
    symbols = kwargs["symbols"]
    interval = kwargs["interval"]
    froms = kwargs["fr"]
    tos = kwargs["to"]
    await get_data_with_rotation(symbols, interval, froms, tos)

def handle(self, *args, **kwargs):
    loop = asyncio.get_event_loop()
    loop.run_until_complete(self.async_handle(*args, **kwargs))
#

Dou you think this approach will work?

#

Because my utils are asymc

gentle olive
short apex
#

It looks like its running now, but I'm afraid thats not the best structure.

#

I have doubts about ther cronjobs. this would be in my os, right? I plan commit this project to Vercel

gentle olive
#

I don't know why 😄

#

Cron is a *nix utility that's been in use since 1975.

short apex
#

I dont get it

#

Or Railway

#

I will use railway, sorry

gentle olive
short apex
#

On it!

#

I passed by this also

#

This looks a different but interesing approach to my problem.

gentle olive
#

I'm not familiar with it. I see no need to reinvent something like cron, which is a normal feature of any *nix OS and properly supported.

#

But you can try it and see 🙂

short apex
#

I follow your approach. I will configure the cronjobs in my OS to test and commit to Railway and set them there.

short apex
#

@gentle olive can you help me with a Asyncio code ?