I have created a script which download news from several news sources every day. I'm using fulltext index on the news text so I can find relevant news by searching for spesific keywords.
The problem is; My news sources are in different languages. Should I make one table for each langugage so I get an own fulltext index for them? What is the most optimal solution to get the best search results when you are handling text with different languages?
I hope you understand my question and there are some expreinced MySQL GURUs who can help me answer the question!
upgrade to mysql 5 and use utf8 for your character encoding. make sure to really read up on it as PHP does not yet easily support uft8 in all its native functions.
I'm not thinking on the character encoding. I'm thinking on the word weight in the index. For example will a word count zero if it exists in 50% of the rows or above. With multiple languages I will mess this up because the probability that a word exists in more than 50% of the rows will sink dramaticly (different languages, different words). How can I solve this best?
As I can see, the best solution must be different tables for the different languages.
Bookmarks