Rebuild not working for heroku app
Search is down for our web application after we attempted a rebuild. The message we are receiving when we try to rebuild is this:
Action timed out. If this is happening regularly, please contact Flying Sphinx support: http://support.flying-sphinx.com
Action Finished: start
We are unsure of how to resolve the situation and to get Flying Sphinx working again.
Keyboard shortcuts
Generic
? | Show this help |
---|---|
ESC | Blurs the current field |
Comment Form
r | Focus the comment reply box |
---|---|
^ + ↩ | Submit the comment |
You can use Command ⌘
instead of Control ^
on Mac
Support Staff 1 Posted by Pat Allan on Oct 14, 2015 @ 01:07 PM
Hi Nick
Can you let me know which app this is?
Kind regards,
Pat
2 Posted by Nick Branstator on Oct 14, 2015 @ 01:17 PM
Hi Pat,
The app is "phoenix-production". Interestingly, the daemon was
automatically restarted by something - maybe a scheduled index task - and
now it appears to be working fine.
Thanks,
Nick Branstator
Support Staff 3 Posted by Pat Allan on Oct 14, 2015 @ 01:20 PM
Thanks Nick - I thought it might be that app. I did see a glitch and got the daemon running as quickly as I could. Yet to figure out the cause, but I'll investigate soon. Normally monit will sort things out itself, not sure what caused this issue just yet.
4 Posted by Nick Branstator on Oct 14, 2015 @ 01:26 PM
Understood. Thanks for taking care of the restart, and will be interested
to hear what you learn from your investigation.
- Nick
5 Posted by Nick Branstator on Oct 19, 2015 @ 02:37 PM
Hi Pat,
Seems like we are still having problems: Attempts to index are failing, and so is the rebuild action. We think index attempts may have been failing since last Wednesday, when we previously surfaced the problem. All of our index attempts have empty logs.
- Nick
Support Staff 6 Posted by Pat Allan on Oct 19, 2015 @ 02:50 PM
Hi Nick
I just ran a full index manually, and it worked (daemon is running as well). Trying to figure out where things are going wrong between the Flying Sphinx API and the Sphinx indexer commands.
Perhaps unrelated, but just in case: can you let me know which versions of thinking-sphinx, flying-sphinx and Ruby you're using?
Will let you know when I've got something more concrete. It's my top priority right now.
7 Posted by Nick Branstator on Oct 19, 2015 @ 03:03 PM
Ruby 2.1.2
flying-sphinx (1.2.0)
thinking-sphinx (3.1.4)
We noticed that, while we can see your index action in the heroku dashboard
for flying sphinx, we still see no long for it.
Support Staff 8 Posted by Pat Allan on Oct 19, 2015 @ 03:15 PM
That was actually a second test I made, through the API, and it seemed to have the same problem as your calls. It seems to be related to Kernel and/or STDIN/STDOUT not communicating correctly from Sidekiq (on your specific Sphinx server). Restarting Sidekiq has fixed the issue, so I guess that's a short-term fix in place.
Annoyingly, there are no exceptions being raised, but I'll put something custom in place (essentially, no indexer log = raise) so I can at least track if/when it next happens and look at what may have caused it in more detail.
For now, though, things should be operating properly. If you hit any issues, do let me know.
9 Posted by Nick Branstator on Oct 19, 2015 @ 03:15 PM
Additionally, the reindex that you executed does not actually seem to have
updated the index; at the least, we are not seeing any of the new data that
should be visible.
- Nick
Support Staff 10 Posted by Pat Allan on Oct 19, 2015 @ 03:56 PM
Well, that placeholder exception's being raised more often than not. The fact that it's inconsistent is particularly frustrating… I'm going to keep hunting through why it's happening.
The catch is I've got two flights to catch (the first is just over an hour, the second is three and a half hours), with a break of two hours or so in-between, so my communications and debugging is going to be hindered a bit by that, but whenever I can be online working on it, I will be (and in the meantime, I'll be trying to reproduce the issue offline).
Support Staff 11 Posted by Pat Allan on Oct 19, 2015 @ 03:59 PM
The indexer data files are definitely being updated, and Sphinx is rotating the new files into place, so I'd expect results to be up-to-date. Can you run me through the data you're expecting to see and the search queries you're running?
Support Staff 12 Posted by Pat Allan on Oct 20, 2015 @ 01:49 AM
Hi Nick
Very sorry for the delay on this. Have got through my flights, worked through the problem, and things are working now. Redis has been upgraded, which is helping Sidekiq run more smoothly, and I'm not seeing any more IO errors. I'll keep an eye on things, but I've just run several index calls (via the API, so, same behaviour as what you should see), and the output is coming through properly.
If you're still seeing data not appearing which you'd expect to see, let's talk through the queries and the expected data and try to debug that further.
Many thanks for your patience.
Pat
13 Posted by Nick Branstator on Oct 26, 2015 @ 03:39 PM
Hi Pat,
We are having a problem again. We requested a simple rebuild after a
production push of our application today; the rebuild has now been running
for over two hours. Normally it takes place within about 30 minutes. The
log on the dashboard has not updated in more than an hour.
- Nick
Support Staff 14 Posted by Pat Allan on Oct 26, 2015 @ 04:04 PM
Daemon's now back up, am debugging the issue further.
Support Staff 15 Posted by Pat Allan on Oct 26, 2015 @ 04:32 PM
Have put more detailed logging in place, and now the error's disappeared - which is frustrating, but seems to be par for the course for this issue. It's also odd that you're the only customer it's cropping up for (thus, particularly annoying for you, and fewer data points for me).
If I find anything further, I'll let you know.