Reducing index data size

At some point you may find your index files have grown dramatically, and you'll either need to upgrade your account or reduce their size again.

If you're particularly keen on the latter, here's some things to consider:

  • The size of indices really come down to how many fields and attributes you have, and how large they are (especially fields). So, a good place to start is remove any fields/attributes you're not using.
  • If you're using infixes or prefixes, they can increase the size of indices dramatically. A word like 'ruby' is stored as 'ruby', 'r', 'u', 'b', 'y', 'ru', 'ub', 'by', 'rub', 'uby' if you've got infixes set to 1. Consider disabling this feature, or increasing the value to something larger (say, 3?) to reduce the number of words indexed. If you're using infixes, perhaps prefixes could do the job just as well?
  • Using stemmers/morphologies is probably another way that index files get big quickly - many words will get repeated in different tenses/contexts.

Of course, these are features that are definitely useful - so, turning them off may not be acceptable. If that's the case, perhaps your index file sizes won't be able to be reduced easily.

To test all of this out, I recommend grabbing a database backup from Heroku onto your local machine and process the indices there (and make sure any settings you apply in config/thinking_sphinx.yml or config/sphinx.yml are consistent across each environment, if you're using the Thinking Sphinx gem with Ruby).