This update includes seven word breakers for use with SharePoint Portal Server 2003. The word breakers are used to word break documents in the following languages:
-
Czech
-
Danish
-
Greek
-
Hebrew
-
Hungarian
-
Norwegian
-
Portuguese
Download here
For the people that have no idea what word breaking is:
Word breaking is the decomposition of text into individual text tokens, or words. Many languages, especially those with Roman alphabets, have an array of word separators (such as white space) and punctuation that are used to discern words, phrases, and sentences. Word breakers must rely on accurate language heuristics to provide reliable and accurate results.
Word breaking is more complex for character-based systems of writing or script-based alphabets, where the meaning of individual characters is determined from context. For example, in Japanese, a query that contains the term "??" ("Kyouto") does not match a document that contains "???" ("Tokyo"). The word breaker does not separate the characters in "???" ("Tokyo"), so the erroneous term "??" ("Kyouto") is not in the index.
For more information about linguistic considerations that may affect your word breaker implementation, see Linguistic and Unicode Considerations.