I’ve been optimizing #🔹|hubble BigQuery query on the crypto-stellar.crypto_stellar.history_transactions table to generate a yearly wallet review. For starters, I’m calculating:
• Total successful transactions per account.
• Most active transaction day for each account.
• And transaction count of the most active day.
What I’ve Tried So Far:
1️⃣ Snapshot Table:
• Queried the entire table (~450+ GB processed) to create a snapshot with just three columns (account, batch_run_date, successful).
• Using this snapshot, queries now cost ~100-150 MB per wallet.
2️⃣ Partition Filtering:
• Leveraged partition pruning using partition_id (in YYYYMM format) for yearly queries.
• For example, querying 2024 data for all accounts costs ~99 GB.
• Subsequent queries on the partitioned snapshot also cost ~100-150 MB per wallet, which is fantastic!
3️⃣ Current Working Query:
• Added a WITH clause to pre-aggregate daily transaction counts before calculating the most active day and total transactions.
• This works excellently and is way more optimized compared to earlier!
Questions:
❓ 2. Is there an even better way to structure this query for further optimization?
❓ 3. Do you have any tips on reducing costs for large datasets like this? Any help, insights, or tips would be super appreciated! 🙌