#fix

14 messages · Page 1 of 1 (latest)

native mango
#

export const getOrphanedFiles = internalQuery({
args: {},
handler: async (ctx) => {
const ONE_HOUR_MS = 1000 * 60 * 60;
const cutoffTimeMs = Date.now() - ONE_HOUR_MS;
const orphanedFiles = await ctx.db
.query("files")
.withIndex("by_is_uploaded_created_at", q =>
q.eq("isUploaded", false).lte("createdAt", cutoffTimeMs)
)
.withIndex("by_article_id", q =>
q.eq("articleId", undefined)
)
.take(100);
return orphanedFiles;
},
});

files: defineTable({
articleId: v.optional(v.id("articles")),
name: v.string(),
contentType: v.string(),
sizeBytes: v.number(),
isPrivate: v.boolean(),
isUploaded: v.boolean(),
createdAt: v.number(),
updatedAt: v.number(),
})
.index("by_is_uploaded_created_at", ["isUploaded", "createdAt"])
.index("by_article_id", ["articleId"]),

Property 'withIndex' does not exist on type 'Query<{ document: { _id: Id<"files">; _creationTime: number; articleId?: Id<"articles"> | undefined; createdAt: number; updatedAt: number; name: string; contentType: string; sizeBytes: number; isPrivate: boolean; isUploaded: boolean; }; fieldPaths: ExtractFieldPaths<...> | "_id"; indexes: { ...; }; searchIndexes: {}...'.ts(2339)

grizzled hawkBOT
#

Hi @native mango. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:

#

You can only call .withIndex once per query; a Convex query can use a single index, then optionally .filter, .order, and a consumer like .take. [Dynamic builders; QueryInitializer.withIndex]

To find “orphaned” files (not uploaded, older than 1 hour, and with no articleId), use one index and then a .filter:

export const getOrphanedFiles = internalQuery({
  args: {},
  handler: async (ctx) => {
    const ONE_HOUR_MS = 1000 * 60 * 60;
    const cutoffTimeMs = Date.now() - ONE_HOUR_MS;

    const orphanedFiles = await ctx.db
      .query("files")
      // Use a single index for the main range
      .withIndex("by_is_uploaded_created_at", (q) =>
        q.eq("isUploaded", false).lte("createdAt", cutoffTimeMs)
      )
      // Then post-filter on articleId being undefined
      .filter((q) => q.eq(q.field("articleId"), undefined))
      .take(100);

    return orphanedFiles;
  },
});

This matches the documented pattern: use an index to narrow to an efficient range, then .filter for additional conditions that can’t be expressed in the same index. [Indexes querying; Queries that scale]

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@native mango Kapa.ai is still learning and improving, please let me know how I did by reacting below.

native mango
#

@grizzled hawk actually i want to check whether isUpload is false and past 1 hr or articleId is not there and past 1 hr, do it in a way that is efficient

grizzled hawkBOT
#

Let me check the knowledge base for you again. One moment! :books:

#

You can’t efficiently express that full OR condition in a single index range, so the usual pattern is:

  1. Use an index to narrow to “past 1 hour” (so you don’t scan the whole table).
  2. Do the isUploaded / articleId logic in a filter (TypeScript or Convex filter).

Since both branches require “past 1 hr”, define an index on createdAt (or reuse an existing one if you have it), then filter in TypeScript:

// schema
files: defineTable({
  articleId: v.optional(v.id("articles")),
  name: v.string(),
  contentType: v.string(),
  sizeBytes: v.number(),
  isPrivate: v.boolean(),
  isUploaded: v.boolean(),
  createdAt: v.number(),
  updatedAt: v.number(),
})
  .index("by_createdAt", ["createdAt"]);
// query
export const getOrphanedFiles = internalQuery({
  args: {},
  handler: async (ctx) => {
    const ONE_HOUR_MS = 1000 * 60 * 60;
    const cutoffTimeMs = Date.now() - ONE_HOUR_MS;

    const candidates = await ctx.db
      .query("files")
      .withIndex("by_createdAt", (q) =>
        q.lte("createdAt", cutoffTimeMs)
      )
      .take(200); // small cap to stay efficient [[Sorting indexes](<https://docs.convex.dev/database/reading-data/indexes#sorting-with-indexes>)]

    const orphaned = candidates.filter(
      (file) =>
        (!file.isUploaded && file.createdAt <= cutoffTimeMs) ||
        (file.articleId === undefined && file.createdAt <= cutoffTimeMs),
    );

    return orphaned.slice(0, 100);
  },
});

This is efficient because:

#
  • The index limits the scan to “older than 1 hour”, so cost is proportional to old files, not the whole table. [Querying indexes]
  • The OR logic is done in TypeScript, which is the recommended approach for complex filters that don’t map cleanly to a single index. [Complex filters]

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@native mango Kapa.ai is still learning and improving, please let me know how I did by reacting below.

native mango
#

@grizzled hawk but arent there going to be a lot of file past 1 hr

grizzled hawkBOT
#

Let me check the knowledge base for you again. One moment! :books:

#

You’re right to worry about “lots of files past 1 hr” — you want to avoid scanning a huge range.

Given your schema and indexes:

files: defineTable({
  articleId: v.optional(v.id("articles")),
  // ...
  isUploaded: v.boolean(),
  createdAt: v.number(),
})
  .index("by_is_uploaded_created_at", ["isUploaded", "createdAt"])
  .index("by_article_id", ["articleId"]);

You can make this efficient by splitting the OR into two index-backed queries, each of which only scans a small, targeted range, and then merging the results in memory.

1. isUploaded === false and older than 1 hour

Use your by_is_uploaded_created_at index; this is very efficient because the range is contiguous in that index. [Indexes querying]

const ONE_HOUR_MS = 1000 * 60 * 60;
const cutoffTimeMs = Date.now() - ONE_HOUR_MS;

const notUploadedOld = await ctx.db
  .query("files")
  .withIndex("by_is_uploaded_created_at", (q) =>
    q.eq("isUploaded", false).lte("createdAt", cutoffTimeMs)
  )
  .take(100); // or paginate

This query’s cost is proportional to “files with isUploaded === false and createdAt <= cutoffTimeMs”, not the whole table. [Queries that scale]

2. articleId === undefined and older than 1 hour

Your by_article_id index is ordered by (articleId, _creationTime), not createdAt, so you can’t express “older than 1 hour” as an index range there. [Indexes querying]

But you can efficiently get “all files with articleId === undefined” via that index, then filter by time:

#
const noArticle = await ctx.db
  .query("files")
  .withIndex("by_article_id", (q) => q.eq("articleId", undefined))
  .filter((q) => q.lte(q.field("createdAt"), cutoffTimeMs))
  .take(100);

Here, the index narrows to “no articleId”, and the .filter only scans that subset. The docs explicitly recommend this pattern: use .withIndex to narrow, then .filter for extra conditions. [Indexes querying]

3. Combine and de‑duplicate

Finally, merge the two sets and de‑duplicate by _id:

export const getOrphanedFiles = internalQuery({
  args: {},
  handler: async (ctx) => {
    const ONE_HOUR_MS = 1000 * 60 * 60;
    const cutoffTimeMs = Date.now() - ONE_HOUR_MS;

    const [notUploadedOld, noArticleOld] = await Promise.all([
      ctx.db
        .query("files")
        .withIndex("by_is_uploaded_created_at", (q) =>
          q.eq("isUploaded", false).lte("createdAt", cutoffTimeMs)
        )
        .take(100),
      ctx.db
        .query("files")
        .withIndex("by_article_id", (q) => q.eq("articleId", undefined))
        .filter((q) => q.lte(q.field("createdAt"), cutoffTimeMs))
        .take(100),
    ]);

    const byId = new Map<string, typeof notUploadedOld[number]>();
    for (const f of [...notUploadedOld, ...noArticleOld]) {
      byId.set(f._id.toString(), f);
    }

    return Array.from(byId.values());
  },
});

This way:

#
  • You never scan “all files older than 1 hr”.
  • Each branch uses an index to keep the scanned range small.
  • The OR is implemented in TypeScript by combining the two result sets, which is the recommended approach for complex filters that don’t map cleanly to a single index. [Complex filters; Combining indexes]

If you later find that articleId === undefined is still a huge set, you may need an additional index that includes both articleId and createdAt so you can bound by time directly in the index range. The knowledge sources don’t show that exact pattern for your case, but they do show that multi-field indexes are the way to make such combined filters efficient. [Queries that scale]

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon: