In one of my projects I’m processing large amounts of XML files containing
various information about sport events. I have to record all the files that made
modifications for the given sport event. Since file can be sometimes re-uploaded
I’ve decided to use convenient Rails helper:
It worked like a charm - quick and easy solution to my problem.
After a while I noticed increase in processing times for a single file.
Short investigation lead to the first_or_create! line of code:
WOW - it took 1.5s to execute this simple function!
Next step was to run explain analyze on the underlying query:
Because of the ORDER BY "sport_event_files"."id" at the end Postgresql choose to
use index on primary key for the sorting and filter by conditions.
This essentially resulted in doing seq scan of the table with almost 5.5 million rows (always a bad idea ;)
Here is what happened when I removed the ORDER BY part.
I ended up with index-only scan and query time measured in fractions of millisecond!
I re-wrote my code using .exists? and everything was great again!