Rebuild not working for heroku app

Nick Branstator's Avatar

Nick Branstator

Oct 14, 2015 @ 01:03 PM

Search is down for our web application after we attempted a rebuild. The message we are receiving when we try to rebuild is this:

Action timed out. If this is happening regularly, please contact Flying Sphinx support: http://support.flying-sphinx.com
Action Finished: start

We are unsure of how to resolve the situation and to get Flying Sphinx working again.

  1. Support Staff 1 Posted by Pat Allan on Oct 14, 2015 @ 01:07 PM

    Pat Allan's Avatar

    Hi Nick

    Can you let me know which app this is?

    Kind regards,

    Pat

  2. 2 Posted by Nick Branstator on Oct 14, 2015 @ 01:17 PM

    Nick Branstator's Avatar

    Hi Pat,

    The app is "phoenix-production". Interestingly, the daemon was
    automatically restarted by something - maybe a scheduled index task - and
    now it appears to be working fine.

    Thanks,
    Nick Branstator

  3. Support Staff 3 Posted by Pat Allan on Oct 14, 2015 @ 01:20 PM

    Pat Allan's Avatar

    Thanks Nick - I thought it might be that app. I did see a glitch and got the daemon running as quickly as I could. Yet to figure out the cause, but I'll investigate soon. Normally monit will sort things out itself, not sure what caused this issue just yet.

  4. 4 Posted by Nick Branstator on Oct 14, 2015 @ 01:26 PM

    Nick Branstator's Avatar

    Understood. Thanks for taking care of the restart, and will be interested
    to hear what you learn from your investigation.

    - Nick

  5. 5 Posted by Nick Branstator on Oct 19, 2015 @ 02:37 PM

    Nick Branstator's Avatar

    Hi Pat,

    Seems like we are still having problems: Attempts to index are failing, and so is the rebuild action. We think index attempts may have been failing since last Wednesday, when we previously surfaced the problem. All of our index attempts have empty logs.

    - Nick

  6. Support Staff 6 Posted by Pat Allan on Oct 19, 2015 @ 02:50 PM

    Pat Allan's Avatar

    Hi Nick

    I just ran a full index manually, and it worked (daemon is running as well). Trying to figure out where things are going wrong between the Flying Sphinx API and the Sphinx indexer commands.

    Perhaps unrelated, but just in case: can you let me know which versions of thinking-sphinx, flying-sphinx and Ruby you're using?

    Will let you know when I've got something more concrete. It's my top priority right now.

  7. 7 Posted by Nick Branstator on Oct 19, 2015 @ 03:03 PM

    Nick Branstator's Avatar

    Ruby 2.1.2
    flying-sphinx (1.2.0)
    thinking-sphinx (3.1.4)

    We noticed that, while we can see your index action in the heroku dashboard
    for flying sphinx, we still see no long for it.

  8. Support Staff 8 Posted by Pat Allan on Oct 19, 2015 @ 03:15 PM

    Pat Allan's Avatar

    That was actually a second test I made, through the API, and it seemed to have the same problem as your calls. It seems to be related to Kernel and/or STDIN/STDOUT not communicating correctly from Sidekiq (on your specific Sphinx server). Restarting Sidekiq has fixed the issue, so I guess that's a short-term fix in place.

    Annoyingly, there are no exceptions being raised, but I'll put something custom in place (essentially, no indexer log = raise) so I can at least track if/when it next happens and look at what may have caused it in more detail.

    For now, though, things should be operating properly. If you hit any issues, do let me know.

  9. 9 Posted by Nick Branstator on Oct 19, 2015 @ 03:15 PM

    Nick Branstator's Avatar

    Additionally, the reindex that you executed does not actually seem to have
    updated the index; at the least, we are not seeing any of the new data that
    should be visible.

    - Nick

  10. Support Staff 10 Posted by Pat Allan on Oct 19, 2015 @ 03:56 PM

    Pat Allan's Avatar

    Well, that placeholder exception's being raised more often than not. The fact that it's inconsistent is particularly frustrating… I'm going to keep hunting through why it's happening.

    The catch is I've got two flights to catch (the first is just over an hour, the second is three and a half hours), with a break of two hours or so in-between, so my communications and debugging is going to be hindered a bit by that, but whenever I can be online working on it, I will be (and in the meantime, I'll be trying to reproduce the issue offline).

  11. Support Staff 11 Posted by Pat Allan on Oct 19, 2015 @ 03:59 PM

    Pat Allan's Avatar

    The indexer data files are definitely being updated, and Sphinx is rotating the new files into place, so I'd expect results to be up-to-date. Can you run me through the data you're expecting to see and the search queries you're running?

  12. Support Staff 12 Posted by Pat Allan on Oct 20, 2015 @ 01:49 AM

    Pat Allan's Avatar

    Hi Nick

    Very sorry for the delay on this. Have got through my flights, worked through the problem, and things are working now. Redis has been upgraded, which is helping Sidekiq run more smoothly, and I'm not seeing any more IO errors. I'll keep an eye on things, but I've just run several index calls (via the API, so, same behaviour as what you should see), and the output is coming through properly.

    If you're still seeing data not appearing which you'd expect to see, let's talk through the queries and the expected data and try to debug that further.

    Many thanks for your patience.

    Pat

  13. 13 Posted by Nick Branstator on Oct 26, 2015 @ 03:39 PM

    Nick Branstator's Avatar

    Hi Pat,

    We are having a problem again. We requested a simple rebuild after a
    production push of our application today; the rebuild has now been running
    for over two hours. Normally it takes place within about 30 minutes. The
    log on the dashboard has not updated in more than an hour.

    - Nick

  14. Support Staff 14 Posted by Pat Allan on Oct 26, 2015 @ 04:04 PM

    Pat Allan's Avatar

    Daemon's now back up, am debugging the issue further.

  15. Support Staff 15 Posted by Pat Allan on Oct 26, 2015 @ 04:32 PM

    Pat Allan's Avatar

    Have put more detailed logging in place, and now the error's disappeared - which is frustrating, but seems to be par for the course for this issue. It's also odd that you're the only customer it's cropping up for (thus, particularly annoying for you, and fewer data points for me).

    If I find anything further, I'll let you know.

Reply to this discussion

Internal reply

        No formatting (switch to Markdown)

          You can attach files up to 10MB

            If you don't have an account yet, we need to confirm you're human and not a machine trying to post spam.

              Keyboard shortcuts

              Generic

              ? Show this help
              ESC Blurs the current field

              Comment Form

              r Focus the comment reply box
              ^ + ↩ Submit the comment

              You can use Command ⌘ instead of Control ^ on Mac