Are our wordform and stopword files being used?

Sean Massa's Avatar

Sean Massa

20 Oct, 2020 10:08 PM

We talked before about how it looked like our wordform or stopword files were not being found in production. We just deployed signficant changes to these that we'd like to be referenced. Can you confirm if these are or are not found in production properly?

Thanks!

  1. Support Staff 1 Posted by Pat Allan on 21 Oct, 2020 12:22 PM

    Pat Allan's Avatar

    Hi Sean,

    Sorry for not responding sooner. Just looking into this now, and it seems neither of the wordform or stopword files are coming through with the configuration. Are you able to share the code you’re using to generate the configuration, as per what’s covered here: https://github.com/flying-sphinx/flying-sphinx-js#configuration

    Cheers,


    Pat

  2. 2 Posted by Sean Massa on 21 Oct, 2020 09:10 PM

    Sean Massa's Avatar

    No worries.

    I tried updating the configuration to this. Can you check again and let us know if this is working?

    ```
    let configuration = flyingSphinx.configuration();
    configuration.process('rebuild', function(configurer) {
        configurer.addEngine('sphinx');
        configurer.addVersion('2.2.11');

        let fullSphinxConfig = fs.readFileSync(__dirname + '/../src/index-search/sphinx.conf');
        configurer.addConfiguration(fullSphinxConfig);

        let wordforms = fs.readFileSync(__dirname + '/../src/index-search/wordforms.txt');
        configurer.addSettingFile('wordforms', 'wordforms.txt', wordforms);

        let stopwords = fs.readFileSync(__dirname + '/../src/index-search/stopwords.txt');
        configurer.addSettingFile('stopwords', 'stopwords.txt', stopwords);
    });
    ```

    I do see this warning in our index command output:

    > WARNING: stopwords: failed to get file size for '/mnt/local/flying-sphinx/552dde8b36b080ec0/stopwords/stopwords.txt'

  3. Support Staff 3 Posted by Pat Allan on 22 Oct, 2020 12:39 AM

    Pat Allan's Avatar

    Hi Sean,

    The uploaded configuration is still lacking those two additional files, but what you’re doing seems correct. 樂

    Two things to note, just to confirm we’re on the right track:

    The first argument for the process call is the underlying command that gets invoked - so, in the code you’ve shared, it’s ‘rebuild’. However, I’m not seeing a rebuild command come through (rather, separate stop/start/index/configure commands). Are you invoking the code below as a replacement to the built-in flying-sphinx commands?
    And you may want to switch from ‘rebuild’ to ‘configure’ - as that way, it is *just* updating the configuration, rather than stopping the daemon, reconfiguring, indexing, and then starting the daemon again.

    I’m going to review the underlying flying-sphinx-js code to confirm it’s behaving as we’re hoping as well!

    Cheers,


    Pat

  4. 4 Posted by Sean Massa on 22 Oct, 2020 03:20 PM

    Sean Massa's Avatar

    Ah, you are right. This code was not executed.

    I updated it to be "configure" ran it, then rebuilt the index.

    I think it still didn't find our wordforms from some test searches in production.

  5. 5 Posted by Sean Massa on 22 Oct, 2020 04:03 PM

    Sean Massa's Avatar

    I did some more searching. It seems like we'll sometimes get results that implies the wordforms are working, but not always. We even added a unique term that maps to a term that for sure shows up in some of our documents, reindexed, searched the unique term, and found nothing.

  6. Support Staff 6 Posted by Pat Allan on 22 Oct, 2020 10:29 PM

    Pat Allan's Avatar

    I can confirm that the new configuration archive being sent isn’t making it through to the server. The tar.gz file being generated (which includes the Sphinx configuration file alongside the stopwords/wordforms files) is somehow invalid - the Ruby code on my servers can’t read it, and nor can my Mac. So, sounds like there’s a bug in flying-sphinx-js I need to fix (or the underlying libraries it’s depending on? 樂). I will let you know when I’ve a new version of the library ready!

  7. 7 Posted by Sean Massa on 23 Oct, 2020 07:30 AM

    Sean Massa's Avatar

    ok, thanks!

    Let me know if there's anything we can do to help debug the issue.

  8. Support Staff 8 Posted by Pat Allan on 23 Oct, 2020 11:52 AM

    Pat Allan's Avatar

    I’ve just published v1.1.0 of the flying-sphinx package - I’ve switched out one arching library for another, and tested a script very similar to yours for uploading configuration. So, if you can update to this new release and give it a spin, that’d be great!

    Cheers,


    Pat

  9. Support Staff 9 Posted by Pat Allan on 23 Oct, 2020 11:53 AM

    Pat Allan's Avatar

    … archiving, not arching. 路‍♂️

  10. 10 Posted by Sean Massa on 23 Oct, 2020 03:29 PM

    Sean Massa's Avatar

    Thanks!

    I updated our package and re-ran our config/index. The latest index output doesn't complain about stopwords file size anymore, but nothing says whether or not it found stopwords or wordforms.

    Some test searches make it seem like the wordforms are not being used. Can you confirm?

  11. Support Staff 11 Posted by Pat Allan on 24 Oct, 2020 01:07 AM

    Pat Allan's Avatar

    Hi Sean,

    I can confirm that both files are coming through and being included in the configuration file correctly.

    As to whether they’re working: if there’s particular queries you’re running that you’re finding aren’t returning the right records, let me know, but I just did some very simple tests:

    Using the first wordform, “clinical abstractor > clinical_abstractor” - I ran queries on both of those as search terms, and they return the exact same results. Sphinx’s keyword information in the query response also suggests it’s functioning correctly, as it returns the keyword “clinical_abstractor” even when I search for that as separate words (clinical abstractor).
    For the stopwords, I searched for ‘hour’ (which is the fifth line in the stopwords file), and no results were returned, which is what I’d expect.

    So I feel like the files are having the appropriate impact - but yeah, if there’s something that doesn’t look right to you, do let me know.

    Cheers,


    Pat

  12. 12 Posted by Sean Massa on 24 Oct, 2020 05:28 PM

    Sean Massa's Avatar

    We do see some results that make it look like things are working.

    However, we put in a unique term to try to make sure. Perhaps we did this improperly, though. You can see a mapping where we added `poopsmith`:

    ```
    plumber, master plumber, service plumber, poopsmith => plumber
    ```

    But searching for that returns nothing.

    If this is an issue with how we're using sphinx and not how you are hosting it, then we're happy to figure it out on our own. If you have any additional guidance, we appreciate it.

    Thanks for the help!

  13. Support Staff 13 Posted by Pat Allan on 25 Oct, 2020 01:45 AM

    Pat Allan's Avatar

    Hi Sean,

    I’m just reading through the docs, and I’m not sure if comma-separated terms is something that Sphinx expects in word forms files?
    http://sphinxsearch.com/docs/current.html#conf-wordforms
    So, you might need to split this example into a few lines instead.

    I don’t think this is related to Flying Sphinx, but certainly happy to help provide suggestions for debugging the issue anyway, if I can think of anything!

    Cheers,


    Pat

  14. 14 Posted by Sean Massa on 26 Oct, 2020 06:13 PM

    Sean Massa's Avatar

    The docs weren't clear. I tried changing the format to match and it worked!

    Thanks for your help!

  15. Support Staff 15 Posted by Pat Allan on 26 Oct, 2020 11:40 PM

    Pat Allan's Avatar

    Great to hear it’s working! :)

Reply to this discussion

Internal reply

Formatting help / Preview (switch to plain text) No formatting (switch to Markdown)

Attaching KB article:

»

Attached Files

You can attach files up to 10MB

If you don't have an account yet, we need to confirm you're human and not a machine trying to post spam.

Keyboard shortcuts

Generic

? Show this help
ESC Blurs the current field

Comment Form

r Focus the comment reply box
^ + ↩ Submit the comment

You can use Command ⌘ instead of Control ^ on Mac