I ran into an issue when indexing and searching for strings containing the Swedish characters åäö
. Fortunately there is a simple fix for this.
You can read more about Unicode Character Folding at https://www.elastic.co/guide/en/elasticsearch/guide/current/character-folding.html
I had some problems when I tried to make the changes as they described it on the article above but I found a way around it.
So the necessary steps are
- Close
<MYINDEX>
with_close
- Make changes to
<MYINDEX>
via aPUT
call - Open
<MYINDEX>
with_open
- Reindex Your data
NOTE: Remember to change
<MYINDEX>
to the actual name of the index You intend to modify!
Step 1
curl -X POST 'http://localhost:9200/<MYINDEX>/_close?pretty=1'
Step 2
The snippet below has been taken from the article linked above. You might need to make some changes depending on what type of character set You want to allow
PUT /myindex
{
"settings": {
"analysis": {
"filter": {
"swedish_folding": {
"type": "icu_folding",
"unicodeSetFilter": "[^åäöÅÄÖ]"
}
},
"analyzer": {
"swedish_analyzer": {
"tokenizer": "icu_tokenizer",
"filter": [ "swedish_folding", "lowercase" ]
}
}
}
}
}
This is how it would look if You intend to use curl
curl -X PUT -H "Content-Type: application/json" -d '{"settings": { "analysis": { "filter": { "swedish_folding": { "type": "icu_folding", "unicodeSetFilter": "[^åäöÅÄÖ]"}}, "analyzer": {"swedish_analyzer": { "tokenizer": "icu_tokenizer", "filter": [ "swedish_folding", "lowercase" ]}}}}}' "localhost:9200/<MYINDEX>/_settings?pretty=1"
Step 3
curl -X POST 'http://localhost:9200/<MYINDEX>/_open?pretty=1'
Make sure to pay attention to the output from the ElasticSearch server!
Step 4
I am using a Django library named django-elasticsearch-dsl
(https://github.com/sabricot/django-elasticsearcoh-dsl) so I would run the command below to rebuild my index.
$ python manage.py search_index --rebuild