The History of AI
70 years: from thought experiment to revolution.
In this section, we'll trace the incredible journey of artificial intelligence — from its origins as an academic curiosity to the technology that's reshaping every industry.
The Beginnings (1950–1997)
Key milestones that shaped the field:
- 1950 — Alan Turing publishes "Computing Machinery and Intelligence" — the Turing Test
- 1956 — Dartmouth Conference coins the term "Artificial Intelligence"
- 1966 — ELIZA, the first chatbot (pattern matching, no understanding)
- 1980s — Expert Systems boom → overpromise → First AI Winter
- 1997 — IBM's Deep Blue defeats world chess champion Garry Kasparov
"Can machines think?" — Alan Turing, 1950
The Bridge Era (1997–2012)
The quiet years that built the foundation:
- 2006 — Geoffrey Hinton's deep learning breakthrough
- 2011 — IBM Watson wins Jeopardy!
- 2012 — AlexNet wins ImageNet → deep learning goes mainstream
- 2012 — Google Brain recognizes cats in YouTube videos (unsupervised)
The key insight: more data + more compute = better results. This simple formula would drive the next decade.
The Transformer Revolution (2012–2022)
Everything changed with one paper:
- 2017 — "Attention Is All You Need" — the Transformer architecture
- 2018 — GPT-1 (117M parameters) — first generative pre-trained transformer
- 2019 — GPT-2 (1.5B) — "too dangerous to release"
- 2020 — GPT-3 (175B) — few-shot learning emerges
- 2021 — GitHub Copilot — AI writes code
- 2022 — ChatGPT launches — 100 million users in 2 months
This was the moment AI went from research to mainstream.
The Big Bang (2022–2026)
We're living through the fastest technology adoption in history:
| Year | Milestone |
|---|---|
| 2023 | GPT-4, Claude 2, Llama 2 (open source) |
| 2024 | Claude 3.5 Sonnet, GPT-4o, Gemini 1.5, Llama 3 |
| 2025 | Claude 4, GPT-5, DeepSeek R1, Qwen 3 |
| 2026 | Claude Opus 4.6, GPT-5.3, agentic AI goes mainstream |
Every 6 months, capabilities that seemed impossible become routine.
What is GPT?
Let's break down the name:
- G = Generative — it creates new content (text, code, images)
- P = Pre-trained — trained on massive datasets before you use it
- T = Transformer — the architecture that makes it all work
GPT is not the only architecture, but it's the most influential. Other approaches: Claude (constitutional AI), Gemini (multimodal), Llama (open source).
Training vs. Inference
A critical distinction:
Training (months, millions of $)
- Feed billions of text samples
- Adjust billions of weights
- Happens once (or periodically)
- Result: a frozen model
Inference (milliseconds, cents)
- User sends a prompt
- Model predicts next tokens
- No learning happens
- Model doesn't remember your conversation
Key insight: AI can't learn from your prompts. When you "teach" ChatGPT something in a conversation, it forgets everything when the session ends.
---quiz question: When was the term "Artificial Intelligence" first coined? options:
- { text: "1943 — at a mathematics conference", correct: false }
- { text: "1956 — at the Dartmouth Conference", correct: true }
- { text: "1966 — when ELIZA was created", correct: false }
- { text: "1997 — when Deep Blue won", correct: false } feedback: The term was coined at the Dartmouth Conference in 1956 by John McCarthy.
---quiz question: What does the "T" in GPT stand for? options:
- { text: "Technology", correct: false }
- { text: "Training", correct: false }
- { text: "Transformer", correct: true }
- { text: "Turing", correct: false } feedback: GPT = Generative Pre-trained Transformer. The Transformer architecture was introduced in the 2017 paper "Attention Is All You Need".
---quiz question: Why can't an LLM learn from your chat messages? options:
- { text: "It's too expensive to retrain", correct: false }
- { text: "Inference doesn't update model weights — the model is frozen", correct: true }
- { text: "The API blocks learning for privacy", correct: false } feedback: During inference, the model only predicts tokens — it never updates its weights. Training is a separate, expensive process.