The History of AI

70 years: from thought experiment to revolution.

In this section, we'll trace the incredible journey of artificial intelligence — from its origins as an academic curiosity to the technology that's reshaping every industry.

The Beginnings (1950–1997)

Key milestones that shaped the field:

1950 — Alan Turing publishes "Computing Machinery and Intelligence" — the Turing Test
1956 — Dartmouth Conference coins the term "Artificial Intelligence"
1966 — ELIZA, the first chatbot (pattern matching, no understanding)
1980s — Expert Systems boom → overpromise → First AI Winter
1997 — IBM's Deep Blue defeats world chess champion Garry Kasparov

"Can machines think?" — Alan Turing, 1950

The Bridge Era (1997–2012)

The quiet years that built the foundation:

2006 — Geoffrey Hinton's deep learning breakthrough
2011 — IBM Watson wins Jeopardy!
2012 — AlexNet wins ImageNet → deep learning goes mainstream
2012 — Google Brain recognizes cats in YouTube videos (unsupervised)

The key insight: more data + more compute = better results. This simple formula would drive the next decade.

The Transformer Revolution (2012–2022)

Everything changed with one paper:

2017 — "Attention Is All You Need" — the Transformer architecture
2018 — GPT-1 (117M parameters) — first generative pre-trained transformer
2019 — GPT-2 (1.5B) — "too dangerous to release"
2020 — GPT-3 (175B) — few-shot learning emerges
2021 — GitHub Copilot — AI writes code
2022 — ChatGPT launches — 100 million users in 2 months

This was the moment AI went from research to mainstream.

The Big Bang (2022–2026)

We're living through the fastest technology adoption in history:

Year	Milestone
2023	GPT-4, Claude 2, Llama 2 (open source)
2024	Claude 3.5 Sonnet, GPT-4o, Gemini 1.5, Llama 3
2025	Claude 4, GPT-5, DeepSeek R1, Qwen 3
2026	Claude Opus 4.6, GPT-5.3, agentic AI goes mainstream

Every 6 months, capabilities that seemed impossible become routine.

What is GPT?

Let's break down the name:

G = Generative — it creates new content (text, code, images)
P = Pre-trained — trained on massive datasets before you use it
T = Transformer — the architecture that makes it all work

GPT is not the only architecture, but it's the most influential. Other approaches: Claude (constitutional AI), Gemini (multimodal), Llama (open source).

Training vs. Inference

A critical distinction:

Training (months, millions of $)

Feed billions of text samples
Adjust billions of weights
Happens once (or periodically)
Result: a frozen model

Inference (milliseconds, cents)

User sends a prompt
Model predicts next tokens
No learning happens
Model doesn't remember your conversation

Key insight: AI can't learn from your prompts. When you "teach" ChatGPT something in a conversation, it forgets everything when the session ends.

---quiz question: When was the term "Artificial Intelligence" first coined? options:

{ text: "1943 — at a mathematics conference", correct: false }
{ text: "1956 — at the Dartmouth Conference", correct: true }
{ text: "1966 — when ELIZA was created", correct: false }
{ text: "1997 — when Deep Blue won", correct: false } feedback: The term was coined at the Dartmouth Conference in 1956 by John McCarthy.

---quiz question: What does the "T" in GPT stand for? options:

{ text: "Technology", correct: false }
{ text: "Training", correct: false }
{ text: "Transformer", correct: true }
{ text: "Turing", correct: false } feedback: GPT = Generative Pre-trained Transformer. The Transformer architecture was introduced in the 2017 paper "Attention Is All You Need".

---quiz question: Why can't an LLM learn from your chat messages? options:

{ text: "It's too expensive to retrain", correct: false }
{ text: "Inference doesn't update model weights — the model is frozen", correct: true }
{ text: "The API blocks learning for privacy", correct: false } feedback: During inference, the model only predicts tokens — it never updates its weights. Training is a separate, expensive process.