Are our wordform and stopword files being used?
We talked before about how it looked like our wordform or stopword files were not being found in production. We just deployed signficant changes to these that we'd like to be referenced. Can you confirm if these are or are not found in production properly?
Thanks!
Keyboard shortcuts
Generic
? | Show this help |
---|---|
ESC | Blurs the current field |
Comment Form
r | Focus the comment reply box |
---|---|
^ + ↩ | Submit the comment |
You can use Command ⌘
instead of Control ^
on Mac
Support Staff 1 Posted by Pat Allan on 21 Oct, 2020 12:22 PM
Hi Sean,
Sorry for not responding sooner. Just looking into this now, and it seems neither of the wordform or stopword files are coming through with the configuration. Are you able to share the code you’re using to generate the configuration, as per what’s covered here: https://github.com/flying-sphinx/flying-sphinx-js#configuration
Cheers,
—
Pat
2 Posted by Sean Massa on 21 Oct, 2020 09:10 PM
No worries.
I tried updating the configuration to this. Can you check again and let us know if this is working?
```
let configuration = flyingSphinx.configuration();
configuration.process('rebuild', function(configurer) {
configurer.addEngine('sphinx');
configurer.addVersion('2.2.11');
let fullSphinxConfig = fs.readFileSync(__dirname + '/../src/index-search/sphinx.conf');
configurer.addConfiguration(fullSphinxConfig);
let wordforms = fs.readFileSync(__dirname + '/../src/index-search/wordforms.txt');
configurer.addSettingFile('wordforms', 'wordforms.txt', wordforms);
let stopwords = fs.readFileSync(__dirname + '/../src/index-search/stopwords.txt');
configurer.addSettingFile('stopwords', 'stopwords.txt', stopwords);
});
```
I do see this warning in our index command output:
> WARNING: stopwords: failed to get file size for '/mnt/local/flying-sphinx/552dde8b36b080ec0/stopwords/stopwords.txt'
Support Staff 3 Posted by Pat Allan on 22 Oct, 2020 12:39 AM
Hi Sean,
The uploaded configuration is still lacking those two additional files, but what you’re doing seems correct. 樂
Two things to note, just to confirm we’re on the right track:
The first argument for the process call is the underlying command that gets invoked - so, in the code you’ve shared, it’s ‘rebuild’. However, I’m not seeing a rebuild command come through (rather, separate stop/start/index/configure commands). Are you invoking the code below as a replacement to the built-in flying-sphinx commands?
And you may want to switch from ‘rebuild’ to ‘configure’ - as that way, it is *just* updating the configuration, rather than stopping the daemon, reconfiguring, indexing, and then starting the daemon again.
I’m going to review the underlying flying-sphinx-js code to confirm it’s behaving as we’re hoping as well!
Cheers,
—
Pat
4 Posted by Sean Massa on 22 Oct, 2020 03:20 PM
Ah, you are right. This code was not executed.
I updated it to be "configure" ran it, then rebuilt the index.
I think it still didn't find our wordforms from some test searches in production.
5 Posted by Sean Massa on 22 Oct, 2020 04:03 PM
I did some more searching. It seems like we'll sometimes get results that implies the wordforms are working, but not always. We even added a unique term that maps to a term that for sure shows up in some of our documents, reindexed, searched the unique term, and found nothing.
Support Staff 6 Posted by Pat Allan on 22 Oct, 2020 10:29 PM
I can confirm that the new configuration archive being sent isn’t making it through to the server. The tar.gz file being generated (which includes the Sphinx configuration file alongside the stopwords/wordforms files) is somehow invalid - the Ruby code on my servers can’t read it, and nor can my Mac. So, sounds like there’s a bug in flying-sphinx-js I need to fix (or the underlying libraries it’s depending on? 樂). I will let you know when I’ve a new version of the library ready!
7 Posted by Sean Massa on 23 Oct, 2020 07:30 AM
ok, thanks!
Let me know if there's anything we can do to help debug the issue.
Support Staff 8 Posted by Pat Allan on 23 Oct, 2020 11:52 AM
I’ve just published v1.1.0 of the flying-sphinx package - I’ve switched out one arching library for another, and tested a script very similar to yours for uploading configuration. So, if you can update to this new release and give it a spin, that’d be great!
Cheers,
—
Pat
Support Staff 9 Posted by Pat Allan on 23 Oct, 2020 11:53 AM
… archiving, not arching. 路♂️
10 Posted by Sean Massa on 23 Oct, 2020 03:29 PM
Thanks!
I updated our package and re-ran our config/index. The latest index output doesn't complain about stopwords file size anymore, but nothing says whether or not it found stopwords or wordforms.
Some test searches make it seem like the wordforms are not being used. Can you confirm?
Support Staff 11 Posted by Pat Allan on 24 Oct, 2020 01:07 AM
Hi Sean,
I can confirm that both files are coming through and being included in the configuration file correctly.
As to whether they’re working: if there’s particular queries you’re running that you’re finding aren’t returning the right records, let me know, but I just did some very simple tests:
Using the first wordform, “clinical abstractor > clinical_abstractor” - I ran queries on both of those as search terms, and they return the exact same results. Sphinx’s keyword information in the query response also suggests it’s functioning correctly, as it returns the keyword “clinical_abstractor” even when I search for that as separate words (clinical abstractor).
For the stopwords, I searched for ‘hour’ (which is the fifth line in the stopwords file), and no results were returned, which is what I’d expect.
So I feel like the files are having the appropriate impact - but yeah, if there’s something that doesn’t look right to you, do let me know.
Cheers,
—
Pat
12 Posted by Sean Massa on 24 Oct, 2020 05:28 PM
We do see some results that make it look like things are working.
However, we put in a unique term to try to make sure. Perhaps we did this improperly, though. You can see a mapping where we added `poopsmith`:
```
plumber, master plumber, service plumber, poopsmith => plumber
```
But searching for that returns nothing.
If this is an issue with how we're using sphinx and not how you are hosting it, then we're happy to figure it out on our own. If you have any additional guidance, we appreciate it.
Thanks for the help!
Support Staff 13 Posted by Pat Allan on 25 Oct, 2020 01:45 AM
Hi Sean,
I’m just reading through the docs, and I’m not sure if comma-separated terms is something that Sphinx expects in word forms files?
http://sphinxsearch.com/docs/current.html#conf-wordforms
So, you might need to split this example into a few lines instead.
I don’t think this is related to Flying Sphinx, but certainly happy to help provide suggestions for debugging the issue anyway, if I can think of anything!
Cheers,
—
Pat
14 Posted by Sean Massa on 26 Oct, 2020 06:13 PM
The docs weren't clear. I tried changing the format to match and it worked!
Thanks for your help!
Support Staff 15 Posted by Pat Allan on 26 Oct, 2020 11:40 PM
Great to hear it’s working! :)