DelayedDelta doesn't seem to be updating indexes

Erhan Berber's Avatar

Erhan Berber

26 Jul, 2016 10:23 AM

Hi, in my app heroku ruby app I am using flying sphinx Ceramic with delta indexes. My "User" model index file is declared as:
ThinkingSphinx::Index.define :user, :with => :active_record, :delta => ThinkingSphinx::Deltas::DelayedDelta do
 ....
end
Gemfile has the following:
gem 'thinking-sphinx', '3.1.0'
gem 'flying-sphinx', '1.2.0'
gem 'ts-delayed-delta'

User model has the "delta" attribute, and it is properly set to "true" after an update. But, the delta processing doesn't seem to occur. The record's delta value is stuck at "true" and that record is excluded from sphinx search results.

Any idea what could be causing this? I do have 1 worker dyno in heroku and delayed job is working fine for everything else that I need it for.

Any help will be appreciated. Thank you.

  1. Support Staff 1 Posted by Pat Allan on 26 Jul, 2016 03:15 PM

    Pat Allan's Avatar

    Hi Erhan

    Can you let me know what the app in question is? I’ll have a look at how the delta index processing is going on the Flying Sphinx side of things.

    Also: how often are you processing all indices (via `heroku run bundle exec flying-sphinx index` or `heroku run rake fs:index`)? And it’s important to note that the delayed delta jobs won’t set the delta flag back to true - this only happens when all the indices are processed. That said, records should certainly appear in results!

    Thanks,


    Pat

  2. 2 Posted by Erhan Berber on 26 Jul, 2016 03:31 PM

    Erhan Berber's Avatar

    Hi Pat,

    Thanks so much for your help.

    My app is "fh7020".

    I see the thinkingsphinx delayed_job record being created after the record update, and I see it getting deleted as well. But, updated record doesn't get returned in search until I do a rake fs:index manually. And the "delta" field is set to true and stays that way till I do a full rake fs:index as well.

    Isn't delayeddelta supposed to re-index updated records automatically through the delayed job? Am I still supposed to schedule to index every so often? Maybe I am missing how delta works?

    Thank you so much for your help again. Looking forward to your response.

    -Erhan

  3. Support Staff 3 Posted by Pat Allan on 27 Jul, 2016 01:26 AM

    Pat Allan's Avatar

    Hi Erhan,

    Good to know the jobs are being created. From what I can see, deltas are being processed in your app (you can check this from your dashboard as well via `heroku addons:open flying_sphinx`).

    Deltas do re-index updated records automatically, but the advantage of delta indices is that they only cover a small number of records, and thus are super fast to process. The more changes you make to records, the larger the delta indices grow, and thus the slower they become to process. Processing _all indices_ will bring all the delta changes into the core indices, and return the delta index to being empty - thus, keeping them small and fast.

    So, even with using delayed delta, you’ll still want to run the command to process all indices regularly - whether that be daily or hourly, depending on the amount of data you have. From what I can see, your overall amount of data is not super large, so running the task daily should be fine.

    Now of course, this is not quite related to the issue you’ve been seeing where updated records aren’t matching search queries. If this issue is still happening, can you run me through the index definition, the search query you’re running, and the versions of the thinking-sphinx, ts-delayed-delta and flying-sphinx gems you’re using?

    Cheers,

    Pat

  4. 4 Posted by Erhan Berber on 27 Jul, 2016 04:28 AM

    Erhan Berber's Avatar

    Hi Pat,

    Thank you for your responsiveness as usual.

    My first question: Did you really mean to say "do" or "don't" in "Deltas do
    re-index updated records automatically" ?

    To answer your question about search results not matching. My issue is that
    the delta=true results are not being returned at all. This is still
    happening.

    My index definition:
    ThinkingSphinx::Index.define :user, :with => :active_record, :delta =>
    ThinkingSphinx::Deltas::DelayedDelta do
      indexes [first_name, last_name], :as => :name, :sortable => true
      indexes last_name, :sortable => true
      indexes email, :sortable => true

      set_property :enable_star => 1
      set_property :min_infix_len => 1
    end

    In my user model I have a sphinx scope as:
      sphinx_scope(:by_name) { |name|
        {:conditions => {:name => "#{name}*"}}
      }

    When I do a search like below it only returns the records where delta != true
    User.by_name("wick")

    My related gems are:
    gem 'mysql2', '~> 0.3.13'
    gem 'thinking-sphinx', '3.1.0'
    gem 'flying-sphinx', '1.2.0'
    gem 'ts-delayed-delta' (in gemfile.lock (2.0.2) is the version of
    ts-delayed-delta)

    Thank you so much again for your help. Looking forward to your response.

    -Erhan

  5. Support Staff 5 Posted by Pat Allan on 27 Jul, 2016 05:13 AM

    Pat Allan's Avatar

    Hi Erhan,

    Thanks for all of those details. ‘do’ is what I meant - the delta processing DOES process the delta index automatically if you’re modifying your model’s records in ways that fire standard ActiveRecord callbacks - and given the delta flag is automatically being set to true, and you’ve mentioned the jobs are being added to the DJ queue, it sounds like that side of things is working as expected.

    Looking back over the past 24 hours or so, it seems the only two delta requests that have made it through to Flying Sphinx are for the Challenge model. I just manually ran a delta request for the User model, and now there is a search result there matching ‘wick’. So, it seems the issue is possibly model-specific and perhaps somewhere in the connection between Delayed Job and the Flying Sphinx API. Are you seeing any exceptions when the delta job is processed by Delayed Job?


    Pat

  6. 6 Posted by Erhan Berber on 27 Jul, 2016 07:40 AM

    Erhan Berber's Avatar

    Hi Pat,

    Thank you for the great response.

    From what I see in the logs the DJ entry seems to be created normally after a
    user record update. Then the queue seems to be getting processed as well, but
    it seems to me like somehow the updated record's index is removed from sphinx
    core or maybe just omitted from search results somehow?

    And isn't the delta flag supposed to get set back to false after index is
    automatically rebuilt through DJ? Like I said before it gets stuck at "true"
    until the next re-indexing.

    You are right, the "Challenge" model seems to be working fine. When I looked
    at the logs after a challenge update I see the following lines in the logs
    during DJ processing that I am not seeing after a "User" update. What could
    be causing this? I am doing a regular update through activerecord.

    2016-07-27T07:33:35.398784+00:00 app[worker.1]:
    [Worker(host:8f8492bd-cf16-4c34-95ec-76a916ecc7f9 pid:3)] Job Thinking Sphinx:
    Process challenge_delta (id=1893364) COMPLETED after 1.0858
    2016-07-27T07:33:35.398851+00:00 app[worker.1]: 2016-07-27T07:33:35+0000:
    [Worker(host:8f8492bd-cf16-4c34-95ec-76a916ecc7f9 pid:3)] Job Thinking Sphinx:
    Process challenge_delta (id=1893364) COMPLETED after 1.0858

    And below is the logs right after a "User" update:

    2016-07-27T07:26:38.039845+00:00 app[web.1]: SQL (0.7ms) UPDATE "users" SET
    "perishable_token" = $1, "updated_at" = $2 WHERE "users"."id" = $3
    [["perishable_token", "JSUTqclb4CsU7O9k1cbp"], ["updated_at", "2016-07-27
    07:26:38.037492"], ["id", 4]]
    2016-07-27T07:26:38.042310+00:00 app[web.1]: (1.7ms) COMMIT
    2016-07-27T07:26:38.044156+00:00 app[web.1]: (0.8ms) SELECT COUNT(*) FROM
    "delayed_jobs" WHERE "delayed_jobs"."handler" = $1 AND
    "delayed_jobs"."locked_at" IS NULL [["handler", "---
    !ruby/object:ThinkingSphinx::Deltas::DelayedDelta::DeltaJob\nindex:
    user_delta\n"]]
    2016-07-27T07:26:38.045707+00:00 app[web.1]: (0.6ms) BEGIN
    2016-07-27T07:26:38.047718+00:00 app[web.1]: SQL (0.7ms) INSERT INTO
    "delayed_jobs" ("handler", "run_at", "created_at", "updated_at") VALUES ($1,
    $2, $3, $4) RETURNING "id" [["handler", "---
    !ruby/object:ThinkingSphinx::Deltas::DelayedDelta::FlagAsDeletedJob\nindex:
    user_core\ndocument_id: 38\n"], ["run_at", "2016-07-27 07:26:38.045885"],
    ["created_at", "2016-07-27 07:26:38.046141"], ["updated_at", "2016-07-27
    07:26:38.046141"]]
    2016-07-27T07:26:38.049633+00:00 app[web.1]: (1.5ms) COMMIT
    2016-07-27T07:26:38.051274+00:00 app[web.1]: Rendered users/_user_row.haml
    (0.8ms)
    2016-07-27T07:26:38.051423+00:00 app[web.1]: Rendered users/update.js.erb
    (1.1ms)
    2016-07-27T07:26:38.049187+00:00 app[worker.1]:
    Delayed::Backend::ActiveRecord::Job Load (1.7ms) UPDATE "delayed_jobs" SET
    locked_at = '2016-07-27 07:26:38.046718', locked_by =
    'host:8f8492bd-cf16-4c34-95ec-76a916ecc7f9 pid:3' WHERE id IN (SELECT
    "delayed_jobs"."id" FROM "delayed_jobs" WHERE ((run_at <= '2016-07-27
    07:26:38.046046' AND (locked_at IS NULL OR locked_at < '2016-07-27
    03:26:38.046089') OR locked_by = 'host:8f8492bd-cf16-4c34-95ec-76a916ecc7f9
    pid:3') AND failed_at IS NULL) ORDER BY priority ASC, run_at ASC LIMIT 1 FOR
    UPDATE) RETURNING *
    2016-07-27T07:26:38.059470+00:00 app[web.1]: Completed 200 OK in 34ms (Views:
    1.5ms | ActiveRecord: 12.1ms)
    2016-07-27T07:26:38.057959+00:00 heroku[router]: at=info method=POST
    path="/users/4" host=app.70-20.com
    request_id=8ff92224-bf32-43e3-9774-935054f6a3cf fwd="5.25.3.251" dyno=web.1
    connect=1ms service=184ms status=200 bytes=1900
    2016-07-27T07:26:43.056718+00:00 app[worker.1]:
    Delayed::Backend::ActiveRecord::Job Load (5.3ms) UPDATE "delayed_jobs" SET
    locked_at = '2016-07-27 07:26:43.050669', locked_by =
    'host:8f8492bd-cf16-4c34-95ec-76a916ecc7f9 pid:3' WHERE id IN (SELECT
    "delayed_jobs"."id" FROM "delayed_jobs" WHERE ((run_at <= '2016-07-27
    07:26:43.050070' AND (locked_at IS NULL OR locked_at < '2016-07-27
    03:26:43.050105') OR locked_by = 'host:8f8492bd-cf16-4c34-95ec-76a916ecc7f9
    pid:3') AND failed_at IS NULL) ORDER BY priority ASC, run_at ASC LIMIT 1 FOR
    UPDATE) RETURNING *
    2016-07-27T07:26:43.058427+00:00 app[worker.1]:
    [Worker(host:8f8492bd-cf16-4c34-95ec-76a916ecc7f9 pid:3)] Job Thinking Sphinx:
    Mark 38 in user_core as deleted (id=1893345) RUNNING
    2016-07-27T07:26:43.058514+00:00 app[worker.1]: 2016-07-27T07:26:43+0000:
    [Worker(host:8f8492bd-cf16-4c34-95ec-76a916ecc7f9 pid:3)] Job Thinking Sphinx:
    Mark 38 in user_core as deleted (id=1893345) RUNNING
    2016-07-27T07:26:43.066794+00:00 app[worker.1]: (1.1ms) BEGIN
    2016-07-27T07:26:43.071514+00:00 app[worker.1]: SQL (3.1ms) DELETE FROM
    "delayed_jobs" WHERE "delayed_jobs"."id" = $1 [["id", 1893345]]
    2016-07-27T07:26:43.075452+00:00 app[worker.1]: (3.0ms) COMMIT
    2016-07-27T07:26:43.076120+00:00 app[worker.1]:
    [Worker(host:8f8492bd-cf16-4c34-95ec-76a916ecc7f9 pid:3)] Job Thinking Sphinx:
    Mark 38 in user_core as deleted (id=1893345) COMPLETED after 0.0175
    2016-07-27T07:26:43.076204+00:00 app[worker.1]: 2016-07-27T07:26:43+0000:
    [Worker(host:8f8492bd-cf16-4c34-95ec-76a916ecc7f9 pid:3)] Job Thinking Sphinx:
    Mark 38 in user_core as deleted (id=1893345) COMPLETED after 0.0175

    Eager to hear your thoughts. Thank you so much again.

    -Erhan

  7. Support Staff 7 Posted by Pat Allan on 27 Jul, 2016 07:57 AM

    Pat Allan's Avatar

    I’ve got to duck out for something, but just quickly: it looks like the delta job for user_delta isn’t being run - or even perhaps persisted? Are there any old, failing jobs? ts-delayed-delta won’t double-up on delta requests if it doesn’t have to, but it should be checking only active jobs… still, worth confirming.


    Pat

  8. 8 Posted by Erhan Berber on 27 Jul, 2016 10:24 AM

    Erhan Berber's Avatar

    Hi Pat,

    I think you got it. After I emptied the delayed_job queue, the search started
    working just fine. Wow!

    Thank you so much, Pat. Sorry to bug you with this, it has been a great
    exercise for me though.

    I have one final question, do you still suggest me to do a full index every
    day or so? What would be the reason for that if so?

    -Erhan

  9. Support Staff 9 Posted by Pat Allan on 27 Jul, 2016 02:06 PM

    Pat Allan's Avatar

    Great to hear it’s working again :)

    I guess there’s room for improvement in the ts-delayed-delta gem… the check for existing jobs should be a bit fussier.

    I understand the need to run a full index regularly is a bit counter-intuitive… I’ll try to explain with a broad example. Let’s say you have one thousand users. One user object changes, and so its delta flag is set to true, and the delta index processes super quickly - it only has to deal with that single record.

    Then some time passes, and four more users have had changes, with their delta flags set to true. Again, the delta index processes each time, and it’s still pretty quick, as there are only five records involved.

    But over the course of a week, it could be that 200 users have had updates to their data, which means 200 user records with delta flags set to true. The delta indexing after each update is now a fair bit slower - there’s a bunch more data being processed. This means that search results may not be returning accurately quite as quickly as you were hoping.

    If you never run another full index, you could end up with *all* users having a delta flag set to true, and then delta processing will be much slower again, meaning accurate search results take longer to be returned.

    If you run a full index regularly, then the core index has the latest changes available, all users have their delta flags set back to false, and then the next update to a user means there’s just a single delta record, and delta processing is super fast again.

    Does that help clarify things?

    Also, just to explain why the delta flag remains true after a delta processing… user A is edited, has their delta flag set, and the changes to their data are processed in the delta index. If the delta flag was then reset to false, and then User B is edited, B’s latest changes will now be present in the delta index, but A’s wouldn’t be, because the delta index only processes records where delta is set to true. Thus, A’s data would not be present at all in search results until a full index.

    Any further questions or issues, do let me know!

    Cheers


    Pat

  10. 10 Posted by Erhan Berber on 27 Jul, 2016 08:12 PM

    Erhan Berber's Avatar

    Hi Pat,

    Thank you for the great explanations of both the need of the full-indexing and
    the reason why the delta flag remains at true. It all makes sense and really
    helped me understand these two important subjects.

    I am all good now and really love using sphinx. Thank you for your
    responsiveness and great help.

    Stay well.

    -Erhan

Reply to this discussion

Internal reply

Formatting help / Preview (switch to plain text) No formatting (switch to Markdown)

Attaching KB article:

»

Attached Files

You can attach files up to 10MB

If you don't have an account yet, we need to confirm you're human and not a machine trying to post spam.

Keyboard shortcuts

Generic

? Show this help
ESC Blurs the current field

Comment Form

r Focus the comment reply box
^ + ↩ Submit the comment

You can use Command ⌘ instead of Control ^ on Mac