Categories

text segmentation

OCR

OCR, short for Optical Character Recognition, is a technology that converts text from images, scans, photographs, screenshots or PDF files into machine-readable text. Instead of forcing a person to retype ...

OCR, short for Optical Character Recognition, is a technology that converts text from images, scans, photographs, screenshots or PDF files into machine-readable text. Instead of forcing a person to retype Read article

Chunking

Chunking is the process of splitting longer content into smaller, meaningful parts so that an AI system can store, search, retrieve and use the right information more effectively. Instead of ...

Chunking is the process of splitting longer content into smaller, meaningful parts so that an AI system can store, search, retrieve and use the right information more effectively. Instead of Read article

Token

A token is the basic unit of text or other input that a language model processes. It is not exactly the same as a word, a sentence or a character. ...

A token is the basic unit of text or other input that a language model processes. It is not exactly the same as a word, a sentence or a character. Read article