Topic: [Bug] Forum searching ignores underscores

Posted under Site Bug Reports & Feature Requests

Bug summary:
When searching the forums, the search ignores underscores.

Steps to reproduce:
1. Enter Forums section
2. Click on the search box
3. Enter a word: twelve_balls
4. press the Search button

Expected results:
A search results table containing link to 1 post (this one).

Actual results:
A search results table containing links to 4 posts.

Updated

This is due to how the search index is optimized. It splits sentences into small chunks and lists them by relevance. One thing it does is drop plurals and other endings. For instance, "keep, keeps, keeping, keeper, keeped" are all stored under the same entry, and will return the same results in a search (notice how the last one isn't actually a word). There's a lot of shortcuts like this that make the search index faster, but since it isn't designed for our specific case, it's not exactly reliable.

The good news is I'm in the process of revamping the forum search to not be completely terrible. I'll keep what you mentioned in mind and see if there's a way to fix that.

Updated by anonymous

What you described is called stemming, or a stemming algorithm. But this algorithm isn't really what this bug is about.

The problem is in the fact that, even before the stemming algorithm comes into play, the underscores are already ignored. Even before balls can be stemmed to ball, the underscore is already lost. So to fix this bug, you don't really need to revamp the whole search, but just add a small fix to the lexicographic parsing / text sanitizing or whatever, at the beginning of the indexing process.

Btw, in general, porter2 is the best overal stemming algorithm for English.

Updated by anonymous

  • 1