NLP

NLP

Natural Language Processing

Natural Language Processing (NLP) is the field of AI that enables computers to understand, interpret, and generate human language — powering machine translation, sentiment analysis, named entity recognition, question answering, and ASR (automatic speech recognition) across written and spoken modalities.

Modern NLP is driven by pre-trained transformer-based models: encoder-only models such as BERT excel at natural language understanding (NLU) tasks like classification and information extraction, while decoder-only large language models (LLMs) and GPT-family models are used for text generation, zero-shot reasoning, and prompt engineering.

NLP tasks span the full pipeline from tokenization and vector embedding (converting text into numerical representations stored in a vector database) through semantic understanding to generation — and foundation models with RAG architectures now allow a single pre-trained model to perform dozens of NLP tasks without task-specific retraining.

🔍 Click image to zoom

Natural language processing — text understanding pipeline

Frequently Asked Questions

What is the difference between NLP and NLU?

Natural Language Processing (NLP) is the broader field covering all computational work with human language, including both understanding and generation. Natural Language Understanding (NLU) is a subset of NLP focused specifically on interpreting the meaning, intent, and context of input text — essentially the comprehension side. Natural Language Generation (NLG) is the complementary subset that covers producing text from structured data or internal representations.

How has NLP changed with the rise of LLMs?

Before large language models, NLP tasks required separate specialised models — one for sentiment analysis, another for translation, another for named entity recognition. LLMs unified NLP by demonstrating that a single large pre-trained model can perform all these tasks through prompting or minimal fine-tuning. This paradigm shift, beginning with BERT (2018) and accelerating with GPT-3 (2020), reduced the need for task-specific feature engineering and custom architectures.

What languages does NLP work with?

NLP works with any natural language, but model quality varies significantly by language. English is the most well-resourced language in NLP — the majority of training data and benchmarks are in English. Multilingual models like mBERT and NLLB-200 extend NLP capabilities to 100+ languages, but low-resource languages (those with limited digital text) still see substantially lower performance than English. Languages with non-Latin scripts or complex morphology (e.g. Arabic, Chinese, Finnish) present additional modelling challenges.

Frequently Asked Questions

What is the difference between NLP and NLU?

How has NLP changed with the rise of LLMs?

What languages does NLP work with?

See Also