#Indexing SearchVector makes query slower in django ORM - PostgreSQL

24 messages · Page 1 of 1 (latest)

ocean burrow
#

Hello! I have a performance problem related to full-text search in PostgreSQL. Indexing a SearchVector using a GIN index doesn't improve performance, and adding a SearchVectorField (populate it, index it, and add update trigger for it) actually throws it under the bus (makes the table size 7 times bigger [from 7GB to 50GB] and count(*) queries don't return in an hour for 8m records).

For more details please check the question I have put on stackoverflow:
https://stackoverflow.com/questions/75983345/indexing-searchvector-makes-query-slower-in-django-orm-postgresql

tulip folio
#

Have you used explain and/or explain analyze with the query?

ocean burrow
#

I decided to go with GIN indexes and triagrams and icontains. If I don't manage to get acceptable timings with this method I'll try to analyze this and the initial method I tried and I'll keep this and stackoverflow post updated.

tulip folio
#

The query and explain are necessary to improve performance on a query.

ocean burrow
#

I know, it's kinda stupid of me to not analyze it first, but it's my first time working with that much data. I have never indexed a db before etc.

#

Do you know from where can I get an outline on which indexes help (the planner is using them) on which query-types-functions? Except waiting for the timings and guess?

tulip folio
#

Try just explain. Analyze runs the query.

ocean burrow
#

So explains tells me if an index is used?

#

Apart from other things ofcourse.

#

I am definitely going to try these commands anyways after my indexes are finished building.

tulip folio
#

I believe so.

ocean burrow
#

That's right. GIN or GIST works best for text. GIN is slower to build/update but faster at retrieval and via versa. I'll try to analyze my queries when index building is done and update the posts if I can't figure it out. Thanks for your time.

ocean burrow
#

I analyzed the query. It did a sequential scan. When I forced it to use the indexes via a flag it got good timings, but it's not acceptable to trick the planner via flags in a production environment so I am gonna build an index on the compound columns (title and abstract) so the planner uses it when there are 2 LIKES connected with OR on them.

#

If I was postgres I would use index1 to fetch results for the first or clause, and then I would use index2 to fetch results for the second or clause and then union them. That would be faster, but I guess it can't do it, so I am going to create the compound index on both of them.

tulip folio
#

You could structure your query to use a union instead of an OR.

ocean burrow
#

You must think I am genius by now. I'll do my thing because I don't need separate indexes, it was bad design from the start. Thank you for your help!

tulip folio
#

I always appreciate the follow-up on what works/what doesn't. It helps me know where to start when I tackle similar problems in the future!

ocean burrow
#

I am sure the union thing would work. I'll update tommorow if the compound index acts as expected.

ocean burrow
#

Hello friend. At last, I used the full-text search of Postgres which I think yields the best results of all methods for my use case. I created a virtual ts_vector column and then indexed it using a gin index. Timings for searching keywords vary greatly from some milliseconds to 10s (depending most likely on the internal cache), although I am happy with the average.

#

I am not gonna lie, django didn't help me a lot with that, I had to write SQL my self cause django's method is kinda outdated.

#

I created the column and indexes like that in a custom migration file.

        RunSQL("""
                ALTER TABLE main_patent DROP COLUMN IF EXISTS search; 
                ALTER TABLE main_patent ADD search TSVECTOR GENERATED ALWAYS AS (
                    setweight(to_tsvector('english', title), 'A') || ' ' ||
                    setweight(to_tsvector('english', abstract), 'B') :: TSVECTOR
                ) STORED;""",
            reverse_sql="ALTER TABLE main_patent DROP COLUMN search;"
        ),
        RunSQL("CREATE INDEX main_patent_search_idx ON main_patent USING GIN (search);",
            reverse_sql="DROP INDEX main_patent_search_idx;"),

tulip folio
#

I'm glad you found a solution

ocean burrow
#

Thanks. Now analyze has become my favorite command in postgres.