#✅ - Unpopular Opinion? AWS Scan type Query - Worthless

16 messages · Page 1 of 1 (latest)

balmy star
#

Understanding that there are two types of queries depending on whether or not you specify secondary indexes, why does the Scan query even exist? In what scenario would a user running a query have any desire for the query to only hit part of their dynamoDB table??

I don't understand the logic at all of how this is the default.

If I say listUsers({
filter: {
firstName: "Chris"
},
limit: 200
})

why in the world would I ever want my query to just sift there the first 200 users if I have 200,000,000 users. This to me just seems so so worthless and it overcomplicates and confuses the users. It took me a month before I realized the filter expression isn't actually what I'm querying but just filtering the first 200 users.

In Firebase, if I query for something, I expect it to check everything and return the results.

Has anyone else every needed just a basic scan query instead of having to specify secondaryIndexes?

left moss
molten arrow
#

Yeah, I use scan for small-scale migrations, but not sure what it would be actually useful for

vast torrent
#

Until we have a sort without indexes or OpenSearch I have to scan all my records for certain tables and then order them 😦

left moss
#

DynamoDB prioritizes fast lookup on specific keys.

Scans: Ideal for one-time, full-table operations or filtering large datasets without a specific index. They can be slow for very large tables.

Queries: Faster and more targeted for retrieving specific items based on a defined secondary index.

Scans are fine while the database is still small but as it grows, you will want to think about your data access patterns more and thus create indexes that specifically help with the way you usually want to group and sort data. Otherwise, only using Scans will only become more inefficient and therefore costly.

At that point Scans should only be used specifically for table-wide operations as mentioned above.

vast torrent
#

I have one table that I need to lookup data on city, state, zip, ownerId, assignedId, status plus others and then sort on it. I've used all 20 indexes. I'll probably never have more than 100k records in a table. Currently I'm looking into syncing all data to OpenSearch for sort and search but don't have that working yet in Gen 2. Sorting on the standard Graphql queries would be so helpful for my needs.

What options are available when needing to sort by pretty much all fields in the table? I would totally use a SQL datastore if that was an option and provided sort and text search

left moss
#

you can even combine DynamoDB and a SQL database in the same data schema

vast torrent
#

wow. I did not know that. I'm going to have to give that a try

#

Combining might help a lot!

left moss
#

But I do understand the frustration around DynamoDB, there is a reason PostgreSQL is so popular these days. We want to support that and provide more flexibility around data in general so big plans for the future around SQL.

vast torrent
#

If you ever need any testers for SQL, OpenSearch or anything fun just let me know 🙂

balmy star
#

Thanks for the conversation, it seems like the sentiment is similar everywhere. I just wish they would eliminate the scan stuff and generate secondaryIndex queries for all of my data by default. The scan stuff will never be used in my app and it fills up all of my graphql data. I just don't see the need for it at all when compared to the secondaryIndex. That and maybe change the filter to like postScanFilter or something so it's more intuitive. The term filter to me implies that it's actually doing something meaningful in the search, when it really isn't.

left moss
#

that would be a difficult problem to solve because we'd need to know ahead of time how the data should be bucketed, sorted, and by which fields/attributes. Also, DynamoDB has a limit of 20 global secondary indexes and 5 local secondary indexes per table and that could be used up quickly if we created indexes automatically.

left moss
ripe estuaryBOT
#

✅ - Unpopular Opinion? AWS Scan type Query - Worthless