#PDF Search Using PDFIndex in Examine Not Returning Results

1 messages · Page 1 of 1 (latest)

mortal ridge
#

❓ Questions:

  1. Do I need to specify a media root node for searching in PDFs?

    • Right now, I am NOT filtering by a specific folder, but still get 0 results.
    • If I need to filter by folder, how should I implement that?
  2. Why does Examine Management find PDFs in the Backoffice, but my query in the service returns 0 results?

    • Is my query missing something?
    • Do I need to specify "__IndexType:pdf" in the query?
  3. Is there a better way to link PDFs to the pages they are embedded in?

    • If I want to display the page where the PDF is used, what’s the best approach?
ripe palm
#

When you create your document query and you start with:

var criteria = searcher.CreateQuery("content", BooleanOperation.And)

That corresponds to a Lucene query that checks that the __IndexType: content which is helpful in the external index as you wont find things with indexType: media then.

However in your document approach you have this:

var criteria = searcher.CreateQuery("media", BooleanOperation.And)

Which corresponds to __IndexType: content where atleast on the ones I have running using the PdfIndex the indextype field has the value pdf

Also - if you set a breakpoint right after you execute the search then you can see the full lucene search string on your criteria - it is often really helpful for debugging

mortal ridge
#

@ripe palm Thanks for you response!
I’ve read through the blog post you mentioned, and it was very insightful. However, I’m still struggling with why my document search is returning no results.

For context:

  1. When I search in Pages, everything works perfectly. Here's an example of the generated Lucene query:
Category: content, LuceneQuery: +(combinedField:alexander~2) +__Published:y +searchablePath:1591 -hideFromInternalSearch:1 -__NodeTypeAlias:usnsitemapxml -__NodeTypeAlias:usnrobotstxt

This query returns the expected results.

  1. When I search in Documents (using the PDFIndex), my query looks like this:
Category: media, LuceneQuery: +fileTextContent:konzept~2

However, this query returns 0 results, even though I can find the correct document (Marketing Konzept Hunziker) in Examine Management under the same PDFIndex when I search manually.

Here’s where I’m confused:

  • Does the query need additional fields to properly target the PDFIndex? Should I explicitly include something like +__IndexType:pdf, or is that already implied by the query’s Category: media?
  • When debugging, how can I verify that the query is actually searching where it should?

Just to clarify, the generated Lucene query for document search looks like this when I log it:

Category: media, LuceneQuery: +fileTextContent:konzept~2

It seems correct, but something is still missing. Any advice or guidance on this would be greatly appreciated!

ripe palm
#

I tried to explain it in my above comment, but basically when you in your code write:
var criteria = searcher.CreateQuery("media", BooleanOperation.And) you make a mistake, it should instead be var criteria = searcher.CreateQuery("pdf", BooleanOperation.And).

The code you have now adds a filter to the __IndexType which has 0 results as all índexed pdfs by default will have IndexType: pdf not media