Indexing failing

You Jing Wong's Avatar

You Jing Wong

02 Mar, 2018 03:19 PM

Hi, I have a couple of indexes which are failing

indexing index 'issue_core'...
ERROR: index 'issue_core': sql_range_query: ERROR:  integer out of range
indexing index 'tracker_core'...
ERROR: index 'tracker_core': raw_hits: write error: 474119 of 1048450 bytes written.

For the first error, I tried to solve it by using big_document_ids, as you can see here
https://github.com/bountysource/core/blob/master/app/indices/issue_...

However, the problem persists.

For the second error, would appreciate some help diagnosing this.

Setup:-

gem 'thinking-sphinx', '~> 3.1.3'
gem 'flying-sphinx', '~> 1.0.0'


Sphinx 2.1.4-id64-release (rel21-r4421) herokuapp - bs-prod

Thanks for your help!

  1. Support Staff 1 Posted by Pat Allan on 03 Mar, 2018 12:22 AM

    Pat Allan's Avatar

    Hi there,

    I’ve just been looking into this. Regarding the first issue: the integer out of range error is often due to timestamps rather than primary keys - Sphinx stores timestamps as 32-bit integers (based on UNIX timestamps), so anything that is beyond the year 2106 is going to cause problems. The only timestamp in your issue index is remote_created_at - do you allow human-entered values for that column? And are there any values that are 80+ years in the future? I'm guessing there might be some typos that you'll need to manually fix.

    Regarding the second issue… it looks like you’re currently over the plan limit for the amount of data you’re dealing with, and are running out of disk space. As a temporary change, removing the :infixes option from the name field or increasing the :min_infixes_len setting should help reduce the amount of disk space required.

    Also, within your tracker index, you may be able to improve the indexing speed, by removing the following attribute:

    has '(select count(*) from issues where tracker_id=trackers.id)', as: :issue_count, type: :integer
    

    And replacing it with these two lines:

    joins :issues
    has “COUNT(issues.id)”, as: :issue_count, type: :integer
    

    Certainly, test that change locally first :)

    Let me know how you go with all of this - certainly keen to get it all working with you!

  2. 2 Posted by You Jing Wong on 05 Mar, 2018 12:51 AM

    You Jing Wong's Avatar

    Thank you.

    I've found the dates that are invalid. Thank you for your help! :)

  3. Support Staff 3 Posted by Pat Allan on 05 Mar, 2018 12:59 AM

    Pat Allan's Avatar

    Good to know - hopefully indexing's more reliable now :)

  4. 4 Posted by You Jing Wong on 08 Mar, 2018 04:58 PM

    You Jing Wong's Avatar

    Hi again,

    I've fixed all the remote_created_at dates such that it's always less than year 2106.

    I've also checked all the other values, and it doesn't seem like there is anything else that would trigger this error.

    Do you have any idea what I can do to debug this better?

  5. Support Staff 5 Posted by Pat Allan on 09 Mar, 2018 01:57 PM

    Pat Allan's Avatar

    It seems the primary problem is disk space. So, I’ve temporarily shifted you over to a new server, processed the indices, and they’re *still* using all of the server’s disk space… (which is around 20GB - far more than the 5GB allowed by the plan you’re on).

    We’re going to need to find a better long-term solution!

    There’s two parts to this. First off: it’s looking like you’re currently using even more data than the next plan level (15GB for $300/month on a dedicated machine). These dedicated machines are not generally available (hence the plan’s not listed publicly), but I can fire one up for you if you definitely want to go down that path. If you feel the data’s only going to grow further, then I can consider what the options could be for a larger server and how much that might cost.

    To get your disk usage under control at least for the short term, I would recommend you remove the infix setting from your Tracker index. This should reduce the size of that index significantly (and it seems it’s the largest), though it will mean partial/wildcard matches will not work.

    Something that may also help, though likely not dramatically, is upgrading Sphinx. You’re using 2.1.4, and I recommend upgrading to 2.2.11 - this is controlled in your `config/thinking_sphinx.yml` file, but changing will require running the rebuild task, and that means Sphinx will be down for the full time it takes the indices to be reprocessed (which, going by the most recent run, is at least 4 hours). Also, if the indexing fails (like it has been recently!) then the Sphinx daemon may not be able to boot. So maybe this comes after getting the indices processing correctly on 2.1.4 first before considering such a change.

    It does also seem like there might still be some invalid dates, and I’ll look into that over the coming days.

    If any of this doesn’t make sense, or if you have further questions, do let me know.

    Kind regards,


    Pat

  6. Support Staff 6 Posted by Pat Allan on 11 Mar, 2018 12:18 AM

    Pat Allan's Avatar

    I've spent some time over the past day trying to figure out that integer-out-of-range issue, and not having much luck. The date values are indeed fine, and I don't see anything obvious about how the other attribute values could cause such a problem (all the integers are clearly less than what can be held within a 32-bit integer).

    However, one thing that may help improve the processing speed for the Issue index: I realise you've got the languages association in Issue set up to go through tracker, but in the index it's better to be more explicit. The change:

    has languages(:id), :as => :language_ids
    # becomes:
    has tracker.languages.id, :as => :language_ids
    

    The reason for this is to ensure TS generates just one join to the trackers table (it's currently generating two). This may be fixed in later versions of TS (you're using quite an old version!), but this workaround will be more reliable.

    Also, I've removed the symbol for :id - that was only needed for TS v2 and older (but the behaviour should remain the same, so it's not a big deal, and certainly not related to the problems at hand).

Reply to this discussion

Internal reply

Formatting help / Preview (switch to plain text) No formatting (switch to Markdown)

Attaching KB article:

»

Attached Files

You can attach files up to 10MB

If you don't have an account yet, we need to confirm you're human and not a machine trying to post spam.

Keyboard shortcuts

Generic

? Show this help
ESC Blurs the current field

Comment Form

r Focus the comment reply box
^ + ↩ Submit the comment

You can use Command ⌘ instead of Control ^ on Mac