May 02, 2014

What is tokenization?

The term “tokenization” came up during a presentation on data security today, but it wasn’t really explained. What is tokenization?


"Tokenization is the process of breaking a stream of text up into words, phrases, symbols, or other meaningful elements called tokens. The list of tokens becomes input for further processing such as parsing or text mining. Tokenization is useful both in linguistics (where it is a form of text segmentation), and in computer science, where it forms part of lexical analysis."
Answer this