Non-ascii characters being ignored?

Roger K. Kristiansen's Avatar

Roger K. Kristiansen

02 May, 2017 03:12 PM

Hi,

I just noticed an issue in production that was not a problem before. Not sure when it started, though.

Symptoms:

If I search for a string containing non-ascii characters the results I get do not make much sense. I've tried letters like æ, ø and å as well as some kenji characters for good measure. It looks like if I search exclusively for these characters then they are simply ignored and all entries containing a non-ascii character are returned.

Some additonal facts:

* This does not occur in my development environment.
* I've tried rebuilding the index in both locations.
* Sphinx 2.2.9 in development
* thinking-sphinx 3.3.0
* flying-sphinx 1.2.0
* thinking_sphinx.yml attached

Any idea what might be going on here?

Thanks,
Roger

  1. Support Staff 1 Posted by Pat Allan on 02 May, 2017 03:32 PM

    Pat Allan's Avatar

    Hi Roger,

    It looks like the YAML file didn't make it through - can you try attaching it again? And also, what's the name of the app where this problem is occurring?

    Cheers,

    -- Pat

  2. 2 Posted by Roger K. Kristi... on 03 May, 2017 06:58 AM

    Roger K. Kristiansen's Avatar

    Okay, here's another go at uploading the YAML file. The app is "legelisten".

    Cheers,
    Roger

  3. Support Staff 3 Posted by Pat Allan on 03 May, 2017 09:09 AM

    Pat Allan's Avatar

    On Flying Sphinx, you’re using Sphinx v2.1.4 - if you set `version: 2.2.9` in the production environment of config/thinking_sphinx.yml and then run the rebuild command, that should switch everything over. I’m expecting that’ll help, given that Sphinx 2.2.x now uses UTF-8 by default.

    Of course, if that doesn’t help, do let me know!

  4. 4 Posted by Roger K. Kristi... on 03 May, 2017 12:12 PM

    Roger K. Kristiansen's Avatar

    Thanks Pat, that seems to have fixed it! A little stumped by this, though, as it was definitely working properly at some point earlier.

    Anyway.. I was not aware that I was able to control the Sphinx version, but now I see there's even a section about it in the docs. Since flying sphinx obviously supports 2.2.9, I think perhaps you'd like to update the documentation under the "Sphinx Versions (Ruby Only)" heading as it doesn't mention 2.2.9 at all. :-)

    Thanks a bunch,
    Roger

  5. 5 Posted by Roger K. Kristi... on 03 May, 2017 02:44 PM

    Roger K. Kristiansen's Avatar

    Actually, now I see another issue and I'm wondering if it might be somewhat related.

    In production and staging it seems substring matching is not working as it should, but it is working correctly in development.

    In dev I can type any substring of what I want to fetch and have it returned as expected, but on Heroku I only seem to get a match if I type a complete word For example, searching for an entry names "Lysgaard":

    * In dev, "lys", "ysga", "gaard" will give a match
    * On Heroku: Only typing the word "lysgaard" will give a match. An substring of that word will not match.

    Would you happen to have any idea what might be going on there?

  6. Support Staff 6 Posted by Pat Allan on 04 May, 2017 07:21 AM

    Pat Allan's Avatar

    Hi Roger,

    I’m a bit baffled by this issue, especially since you’re now using the same version of Sphinx locally and on Flying Sphinx, and you’ve got min_infix_len set.

    It looks like wildcard searches work fine, though? (e.g. lys*) - which is certainly what I’d expect, so at least that’s something.

    One thing that might be worth trying is manually going through each step in the rebuild process on staging, to see if that helps? So: stop, configure, index, start

    I’ll keep thinking about possible reasons this is happening.

  7. 7 Posted by Roger K. Kristi... on 04 May, 2017 08:07 AM

    Roger K. Kristiansen's Avatar

    I don't know what happened here – perhaps I did something weird yesterday or I'm just going insane – but as of now all these substring searches also work in staging and production.

    As a side note, I did try your suggestion with the manual rebuild in staging before even checking that the problem still existed, but when I went to prod just to compare the results I noticed it was working there too. Without any intervention on my part.

    So for now, everything is peachy. Thanks again for helping me debug!

  8. Support Staff 8 Posted by Pat Allan on 04 May, 2017 11:40 AM

    Pat Allan's Avatar

    No worries! Glad to know it is working, even if a little mysteriously :)

Reply to this discussion

Internal reply

Formatting help / Preview (switch to plain text) No formatting (switch to Markdown)

Attaching KB article:

»

Attached Files

You can attach files up to 10MB

If you don't have an account yet, we need to confirm you're human and not a machine trying to post spam.

Keyboard shortcuts

Generic

? Show this help
ESC Blurs the current field

Comment Form

r Focus the comment reply box
^ + ↩ Submit the comment

You can use Command ⌘ instead of Control ^ on Mac