The internet is overflowing with resources for learning new programming skills. For thriving disciplines like natural-language processing (NLP), you can find plenty of tutorials, video series, and university lectures online. All of these formats can be great ways to get you started. But when you want to get a truly deep understanding of a new topic, nothing beats a good book. For this article, we’ve compiled a list of our all-time favorite books that you should have in your pack before embarking on your NLP journey.
Why You Should Read Books
Research has shown that students are better at understanding the details of a process if they learned about it from printed books, rather than reading about it online. While reading a book typically takes much longer than watching a tutorial on YouTube, the time that you put into it pays dividends: You’ll spend more time thinking about the subject and come away with a more in-depth understanding.
And, while hard copies of academic books can be expensive, some of the NLP books on this list are available for free. Thanks to the open-source spirit in the coding community, many authors offer their books in both paid and free versions, allowing you to read what some of the greatest minds in NLP have carefully put together.
What is NLP?
NLP refers to a field of computer science whose goal is rendering human language able to be processed by machines. A closely related term is computational linguistics (CL). While there is a subtle difference between the two (CL is more about theory and research, where NLP’s about practical applications), they are often used interchangeably.
If you want to learn more about NLP and why it has become such a hot topic in AI, read our blogpost What Is Natural Language Processing.
Six of the Best NLP Books
Natural Language Processing with Python
The Natural Language Toolkit (NLTK) is a dinosaur among Python NLP libraries. It’s relatively old and somewhat clunky — but it’s still quite popular among NLP practitioners, who cherish it for its comprehensive and robust nature. Natural Language Processing with Python is the NLTK study guide, written by its authors Steven Bird, Ewan Klein, and Edward Loper. Commonly referred to as “the NLTK book,” it almost exclusively focuses on practical examples designed to teach you how to use the library, while also getting you acquainted with the most important NLP concepts.
The lack of theory and math formulas makes this book extremely accessible to beginners. In fact, one of the book’s purposes is to teach programming with Python through an NLP lens. What we really appreciate about this book’s approach is that the authors give center stage to the task of working with data corpora — an aspect often forgotten in more theory-heavy books, but one that constitutes a huge part of the work as an NLP practitioner.
Speech and Language Processing
On the more theory-heavy side is the classic Speech and Language Processing by Dan Jurafsky and James H. Martin. Despite its modest title, this book covers all the NLP topics you could wish for, and then some. The authors don’t shy away from working with linguistic concepts such as dependency parsing and constituency grammars, and it also brings in topics popular in industrial contexts such as chatbots and machine translation.
The online version of the book has been updated to be on track with the most recent academic developments. It includes several chapters on neural networks and expanded sections on question-answering systems. Every chapter works as a self-sufficient unit with references and exercises or historical notes. The authors expect their readers to come equipped with some foundational knowledge in linguistics, math, and concepts from computer science.
Statistical Methods for Speech Recognition
Most publications about NLP are implicitly biased towards the processing of written language, but working with spoken language brings with it its own challenges. Statistical Methods for Speech Recognition by Frederick Jelinek is entirely devoted to the problem of speech recognition using computational methods — a fundamental problem in the ever-growing field of voice-assisted applications.
The chapters build on each other, with the first part of the book covering statistical and methodological basics. It then builds towards increasingly sophisticated implementations for speech recognition. The density of the book makes for an informative read, but it does assume some prior math knowledge, especially probability theory.
The Handbook of Computational Linguistics and Natural Language Processing
The Handbook of Computational Linguistics and Natural Language Processing, edited by Alexander Clark, Chris Fox, and Shalom Lappin, is a well-curated compilation of essays by experts from different NLP domains. The volume starts out with an introduction to formal languages and ends with discussions of intricate NLP applications, such as question answering, natural language generation, and discourse parsing.
In between, Martha Palmer and Nianwen Xue cover the practical side of computational linguistics in their chapter on the annotation of text corpora.
Although the book is a bit dated (it was published in 2010), its collection of diverse topics and authors make this a valuable source of information and a good starting point to gain understanding of a wider range of NLP-related topics.
Linguistic Fundamentals for Natural Language Processing
While some NLP applications have processing pipelines that accept raw text as input, others expect a certain degree of preprocessing. Pre-processing the text requires at least some knowledge of linguistic concepts and how they apply in NLP.
For folks who are interested in NLP, but lack a linguistic background, Linguistic Fundamentals for Natural Language Processing by Emily M. Bender comes to the rescue. It introduces all the main concepts from syntax and morphology, with a focus on how these can help improve NLP applications.
The volume is part of the Synthesis Lectures on Human Technologies series by Morgan & Claypool. The series has publications on many other interesting topics such as automatic error detection and discourse parsing.
Neural Network Methods for Natural Language Processing
No list on NLP books would be complete without a resource for deep learning. This computationally intense, mathematically intricate discipline has graced us with some of the most stunning NLP applications in recent years, from translation systems to automatic image captioning.
Yoav Goldberg, the author of Neural Network Methods for Natural Language Processing is a professor at Israel’s Bar Ilan University and has published many academic papers on NLP with neural nets. He makes sure to cover the basics — such as supervised learning, deep learning, and the challenges of working with language data — before moving on to increasingly complex neural architectures.
Alternative Resources for Studying NLP
Arguably one of the greatest things about studying computer science is the accessibility of most of the material. If you want to learn about the most recent developments in NLP, just head over to arxiv, the resource for academic papers.
Researchers from various disciplines upload their own papers to this online archive, many of which were written so recently that they haven’t even gone through the process of peer review required for publication in a scientific journal.
Reading complex academic papers may seem intimidating at first, but it’s a great way to stay updated on what’s happening in a field evolving as fast as NLP, and it turns out most papers are not very long.
In this article, we introduced some of the most important books on natural language processing. We talked about why it’s good to integrate reading books into your self-study schedule, and we showed you how to find the latest research on the topics you’re interested in.
Do you now feel equipped to learn new exciting stuff? Enroll in our Nanodegree program to become a Natural Language Processing Engineer!