SharePoint Portal Server 2003 Word Breaker Update#
This update includes seven word breakers for use with SharePoint Portal Server 2003. The word breakers are used to word break documents in the following languages:
  • Czech
  • Danish
  • Greek
  • Hebrew
  • Hungarian
  • Norwegian
  • Portuguese

Download here

For the people that have no idea what word breaking is:

Word breaking is the decomposition of text into individual text tokens, or words. Many languages, especially those with Roman alphabets, have an array of word separators (such as white space) and punctuation that are used to discern words, phrases, and sentences. Word breakers must rely on accurate language heuristics to provide reliable and accurate results.

Word breaking is more complex for character-based systems of writing or script-based alphabets, where the meaning of individual characters is determined from context. For example, in Japanese, a query that contains the term "??" ("Kyouto") does not match a document that contains "???" ("Tokyo"). The word breaker does not separate the characters in "???" ("Tokyo"), so the erroneous term "??" ("Kyouto") is not in the index.

For more information about linguistic considerations that may affect your word breaker implementation, see Linguistic and Unicode Considerations.

Monday, August 29, 2005 4:20:18 AM UTC #     | 

 

All content © 2012, Mart Muller
On this page
This site
Calendar
<February 2012>
SunMonTueWedThuFriSat
2930311234
567891011
12131415161718
19202122232425
26272829123
45678910
Archives
Sitemap
Disclaimer

Powered by: newtelligence dasBlog 1.9.7174.0

The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.

Send mail to the author(s) E-mail

Theme design by Jelle Druyts