5 Books Every Data Scientist Should Read
Apr 12, 2023There are a lot of excellent reading materials out there to help you master different data science concepts, but that can make it difficult to know which books are worth your time and money.
Listing every book that I have found useful in my data science career is too much for a single blog post. That’s why in this post I want to share with you books that I found helpful specifically for learning about statistics and machine learning.
These are some of my all-time favorites and books that I feel deserve more love than they get. Whether you’re a beginner or an experienced data scientist, these reads have a lot to offer anyone looking to expand their understanding of statistics and machine learning.
Before we get started though, be sure to head to my YouTube channel if you would prefer to get these recommendations through a video instead of a blog.
Now I want to begin with books that are helpful for statistics.
OpenIntro Statistics
For those of you who are new to the field or who want a review of all the essential statistics concepts, OpenIntro Statistics has you covered.
Yes, it is a huge book, but don’t be intimidated! OpenIntro Statistics is a beginner-friendly textbook, which means that it’s easy to read and understand, particularly if you have entry-level knowledge of statistics already. There are lots of diagrams to help you visualize concepts, and I found the content to be written in an engaging manner. It’s genuinely a good read!
I recommend this book to both people looking to gain a foundation in statistics and those who want a great resource on hand for refreshing their knowledge of statistics. With explanations of axioms of probability to distributions, hypothesis testing, and regression, there is something for everyone to benefit from.
This book is also a great fit for anyone with a busy schedule. It’s not an overly technical or difficult read, so you can learn a lot just from reading a few pages at a time when you have the time to spare.
That’s not all though. Perhaps one of the best things about this book is that you can download the e-book for free. There’s no reason not to give it a try, and I think that, like me, you will find yourself returning to this book every few months because it is such a great review tool and reference.
Mathematical Statistics with Resampling and R
OpenIntro Statistics is wonderful, but it is a beginner-friendly textbook, and you also need options for diving deeper. Mathematical Statistics with Resampling and R by Laura M. Chihara and Tim C. Hesterberg is another textbook, but one that takes an intermediate-level approach to mathematical statistics using the resampling method and the R programming language.
This book was recommended to me by my friend Yuan, who is one of the smartest data scientists I know and currently working at Doordash as a machine learning data scientist. I soon found it to be a fantastic recommendation on their part!
Covering everything from probability theory to hypothesis testing, confidence intervals, linear regression, and more, this book is a comprehensive textbook designed for undergrad stats majors. It’s both in-depth and practical and places emphasis on how to use the resampling method for statistical analysis.
Not only does this book teaches you skills, it also lets you practice. There are exercises, examples, and R code, so you can immediately try out what you are learning. With everything this book has to offer, I was able to go from having just a vague idea of what resampling methods were to using them in my own work. I highly recommend it for those looking to expand their knowledge and gain practical skills.
With these two books, you can both a beginner and intermediate option to hone your knowledge of statistics. Now, I want to share three books to help you do the same with machine learning.
The Hundred-Page Machine Learning Book
The Hundred-Page Machine Learning Book by Andriy Burkov is an excellent place to start your machine learning journey. It’s a clear summary of the key concepts and ideas of machine learning in a little over 100 pages, making it both a concise way to learn the fundamentals and a great resource for quick review.
One thing that I particularly liked and that makes this book unique is the way it pieces things together. Rather than looking at all the different machine learning algorithms individually, the book examines the similarities and differences between them so that you can see how they relate to each other and better understand them.
For example, the book provides an incredibly useful summary of the three parts of a machine learning algorithm. Knowing those gives you a lens to understanding all algorithms and makes it easier to learn new ones.
If this sounds helpful to you but you aren’t quite sure you want to commit to buying it, you can actually download some of the book chapters for free as a trial before buying the book. Check out the website to learn more.
Machine Learning with PyTorch and Scikit-Learn
Written by experts in the machine learning field, Machine Learning with PyTorch and Scikit-Learn is a great guide for expanding your knowledge. The book will teach you everything you need to know to use PyTorch and Scikit-Learn for machine learning.
For me, this book was particularly useful because it places great emphasis on implementation. There are step-by-step implementations of various algorithms which is great for really understanding what’s happening. Lots of exercises and coding examples throughout give you the opportunity to practice and apply everything the book is teaching as well.
That’s not the only reason I like this book though! The authors provide clear explanations of concepts and algorithms and use real-world examples to show how these tools can be applied in practice. They also dive into advanced machine learning topics like deep learning, convolutional neural networks, and natural language processing. There’s plenty to learn here, and it’s all presented in a well-written and easy-to-follow format.
All that is why this book is a top recommendation from me for anyone who wants to learn how to use these powerful Python libraries for machine learning. It will definitely increase your machine learning skills.
Designing Machine Learning Systems
The final book I want to recommend is a very practical one called Designing Machine Learning Systems: An Iterative Process for Production-Ready Applications by Chip Huyen.
This book is a must-read for anyone who truly wants to become a data scientist or machine learning engineer. It takes a holistic view of machine learning systems. Instead of looking at algorithms in detail, this book focuses on how to scope a machine learning project, how to process data, debug models, and productionize them.
In other words, the book shows you what you actually do as a machine learning practitioner. You will gain a clear understanding of the day-to-day work involved. That’s incredibly valuable knowledge for making informed decisions about your career and why I can’t recommend this book enough to anyone seriously interested in working in the machine learning field.
Don’t think that means that this book isn’t a great resource for those already working in the field! It can increase your knowledge of other areas of the system such as data engineering, model deployment, and MLOps.
For instance, one valuable thing this book taught me is that employers ultimately care about business metrics. They don’t care as much about fancy machine learning terms, so you always want to be able to communicate how your work is impacting the business. Even though I already worked in the field, understanding this helped me to be even better at my job.
Designing Machine Learning Systems: An Iterative Process for Production-Ready Applications also comes with a GitHub repository with practical blog posts on the most up-to-date best practice information available in the industry. This means you can continue learning and stay updated even after you’ve finished the book!
Overall, this book is an excellent resource for anyone who wants to work or is already working in machine learning, and I highly recommend adding it to your collection.
Final Thoughts
Being a data scientist is all about learning. Whether you are looking for a job or working in the field already, you can’t afford to let your skills grow dull. Having some great resources on hand can help you expand and refresh your knowledge. I hope that some of the book recommendations from this article will help you build your knowledge base and advance your career!