Zero to Deep learning

I am embarking on an ambitious learning project for the next 3 months:
learning deep learning and the mathematics behind it by studying only online resources such as fastai, MIT opencourseware and YouTube.

placeholder photo

Why deep learning?

Deep learning is cool! Its like particle physics in the early 20th Century crossed with the advent of personal computers in the late 70s. New and exciting frontiers of science are open to help discover the fundamental building blocks of intelligence and the barrier to entry is just a computer with access to the internet.

Everyday there is a new article about “AI” being implemented to revolutionise everything from Snapchat filters to driverless cars. It is the next big thing. Deep learning gives you a power to create applications that were practically impossible with handcrafted systems. Not only that, the current incarnation of the field is only few years old. What an exciting prospect to be able to get into a new and powerful intellectual field when things are just taking off. It feels like a unique moment in human history.

Starting from Zero

Deep learning is supposed to be a very difficult and math heavy field. To really understand it you should have a good foundation in the related mathematics subjects of linear algebra, calculus and probability. Then to get your hands dirty you need programming experience so that you can implement deep learning models.

I am a JavaScript developer by trade, so I think I pass the programming prerequisite. Three years ago, I was working the most boring job ever stocktaking chemicals in University labs and putting the data into excel. I was really motivated to do something else, so I quit my job and studied coding using Freecodecamp for 8months. I was lucky enough to get my first break at Forsite. Fast forward to now and I have been coding professionally for 2 years and I love it.

In the mathematics department, however, I am zero. My background isn’t in machine learning or mathematics. I studied Psychology for 4 years at University but only ever took a couple of papers in statistics. Maths felt very foreign to me until recently. I have never studied linear algebra at all and would look with horror and intrigue at the matrices on my friends’ assignments at university. It seemed impossible to learn higher level mathematics. Like, if you weren’t good in high school then it’s obviously not for you. You just don’t have a math brain sorry. I mostly avoided the maths side of programming when I learnt it. I could code just fine without it!

However a little while ago I discovered Jeremy Khun’s book: Programmer’s introduction to Mathematics (PIM). Reading PIM has given me a new perspective on what the process of understanding maths is actually like. The process of struggling through frustration to understand a mathematical concept or proof seems very similar to the addictive frustration to solution rhythm of programming. The amazing 3Blue1Brown has also opened my eyes to the beauty of maths. And at this point, I am excited to learn more of it.


Deep learning and the mathematics involved is probably much more complex than web development and JavaScript. But I have been down this path before and I know it’s possible. That (and reading a few success stories on Medium) helps to give you confidence during the long periods of frustration and confusion that you will find a way out.

But how do you find your way out? When I first learnt code I developed some strategies over time like building projects, taking regular breaks and always searching google. These worked, but I would still struggle to digest concepts. Overall the process was still a bit haphazard, with lots of forgetting then relearning concepts.

This time around I want to take a more active and methodical approach to learning. There are lots of decisions I will make about what and how to learn different content (What is the best way to learn Mathematics?). In writing this blog, I want to share my experience with what works for me. I am excited to add Anki, a spaced repetition tool, to my learning tool kit. More on that later.

Throughout the learning process there is always Imposter Syndrome. Some days you have a few eureka moments, solve some problems and really feel like you can learn anything. Other times it’s frustrating; nothing makes sense, your motivation wanes and the magnitude of what you still don’t know feels insurmountable. I still felt like an imposter some days doing JavaScript. So why not feel like an imposter doing deep learning?

So naïve

If you look at any job add for a Deep Learning Research Engineer, it will probably say: “requires Masters / PhD in Statistics or Comp Sci”. You also see a similar thing for your average developer Jobs “requires Bachelors in Comp Sci”. It seems like you need a PhD in machine learning to get one of these Research jobs, or at least you should have practical skills comparable to someone with a PhD. I don’t think you need a PhD to get a job, just like you don’t need a BSc in CompSci to get a developer job. But I do think you should have the practical skills.

Can you really learn in 3 months what takes a PhD student years?

I think you can learn what a Deep Learning PhD student knows outside of academia. Ok, I admit the goal is a little preposterous for 3 months. I am willing to budge on the time factor, but I don’t think you need 3 - 5 years to get there.

I may have gotten the wrong idea from Jeremy Howard and Rachel Thomas. I found their deep learning for coders course in 2017 6 and took from it the idea that it was possible to learn math even if you were not good at it before AND that it’s possible to make meaningful contributions to deep learning research even if you didn’t do a PhD in Math.

With that blissful ignorance about the difficulty we can happily skip into my plan ⬇️

🧳 The Plan

I hope to build a strong foundational knowledge in the practical methods for building deep learning systems and the mathematics involved. From there I want to build working state of the art models from scratch and understand them.

Beside each course you’ll see a number which is my arbitrary Full Time Effort metric.
1 point being the full focus of a university paper for 1 semester ( ~ 12 weeks ). Of course it is arbitrary, but my idea behind this is to quantify, like in Scrum, the complexity of the content and the effort I will spend on each course. Basically, the less effort the more content I will skip from the course.

The plan comes out to about 7.5 points. By the above definition this is nearly twice the amount of work I can fit into 12 weeks. I expect I will learn what is more important to focus on and change how I allocate my effort as I learn more. Let’s see what happens…

Machine Learning and Deep learning

Course/Book FTE Cost
Fastai 2019
- Part 1: Practical Deep Learning for Coders 1.5
- Part 2: Deep Learning from the Foundations 1.5
- Deep learning specialization 1 0.5 $45
Jeremy Khun
- Programmers Introduction to Mathematics Book 2 0.5
- 18.01 Single Variable Calculus 0.2
- 18.02 Multivariable Calculus 0.4
- 18.06 Linear Algebra 0.7
- 18.065 Matrix Methods in Data analysis, signal processing and machine learning 0.7
- Essence of Linear Algebra & Calculus 0.5
Khan Academy
- Linear Algebra & Calculus 0.3
- Swift - iOS app development 3 0.3 $11
- Python for data science and machine learning 4 0.3 $11
GPU server instance - AWS p2 instance <200 hours5 ~$300
Total 7.4 ~$367

📜 Do you get a certificate?

I get asked this question by a lot by friends and family when I tell them my plan.

The answer is No. While I will get a certificate for completing (because I paid), I don’t trust this will be enough proof for anyone. Especially as this certificate will only account for a small amount of my overall learning.

When I first learnt to code my certificate was my projects. At the time, people had not heard of FreeCodeCamp so their certificates were not worth much and I only earned one of a possible 4 certificates! The projects I built during the FreeCodeCamp curriculum were, however, undeniable proof of my competency. Github was the proof. You can hire me to write Javascirpt to build websites, because here, I have built websites using JavaScript.

Rightly or wrongly I will follow this philosophy for deep learning. Building models in production with code on Github, implementing papers in code and technical writing will be my certificate of proficiency for deep learning.

Why not do a Masters?

Good question. I would have the chance to basically do what I am doing but in a more supported environment with capable peers to collaborate with and learn from etc.

I am not sure if I could get straight into a program without any compsci degree. Apart from that, there are three reasons:

The cost:
A Masters in NZ would cost at least 10k plus the opportunity cost of one year’s income. If I study online, I can cut this cost greatly and probably get access to on par course content from places like MIT open courseware.

Low Risk
Three months study won’t cost much and it is less of a risk. There are a lot of unknowns when switching to a new field of programming. In three months, I can figure out if I actually like machine learning. Committing to a whole year of a Masters degree is big gamble when I am unsure if it is a good fit (ha). In the end if it’s not for me, well I can get a job writing JavaScript again 🤷‍♂️

The Cutting edge is online
I don’t want to learn MATLAB, or R. It seems to me that University Academia can very often be behind the cutting edge of Computer Science. Many of the things a Comp Sci grad needs for a developer job they will learn outside of uni.
I think we are in a unique moment in history: the field of machine learning has a majority of advanced content available for free on the internet (e.g paperswithcode)

🧠 Anki

Cultivate a taste in what to Ankify

— Michael Nielsen

I was super inspired by a great post by Michael Neilson on augmenting your cognition

augementing cognition

I am excited to try and apply his strategies to my learning process over the next three months. Michael advocates using the Anki application and method to be remember more of what you spend your time learning. Anki is an open source spaced repetition flash card app. You add flash cards with questions and answers. Each time you practice a card you answer how difficult you found it to remember (hard, ok, easy ). Anki then uses the an algorithm based on the forgetting curve to remind you to rehearse each flashcard at the optimal time solidify it as a long term memory.

The main goal studying over the next 3 months is to remember what I learn. As Michael says: “Anki makes memory a choice, rather than a haphazard event, to be left to chance”. It is a seductive idea. That by using Anki you can have a better memory and therefore be more efficient with the time you spend learning.

My biggest takeaway from reading Michael’s post was his advice for starting to use Anki. And I would repeat it for anyone who is interested in trying it out. “Cultivate a taste in what to Ankify”. You should try to learn and understand what kind of information works well as Anki cards, when in the learning process you should add it to Anki and how to best write cards for different types of information. During my learning journey I plan to put a lot of time into experimenting with the best ways to use Anki to make my learning more efficient.

I hope to share my experiences with using Anki in future.

Why I wrote this post?

I was a little terrified to write this blog post (so many revisions 😫 ). My main motivation comes from the advice of Rachel Thomas and Jeremy Howard. The sentiment of sharing your knowledge even if you are a beginner in a subject.

You have a unique perspective that is valuable for someone who is just behind you in the learning process

If you are thinking about breaking into deep learning, I think you can do it.

Thanks for reading, hit me up on twitter ✌️

1 I intend to take the course as supplement to fastai practical deep learning courses
2 This book got me excited about the ideas and magic of mathematical proofs and the beauty behind it. It has helped to change my view of mathematics. I think of it like a foundational perspective on the subject, rather than a rigorous course to teach me many subjects.
3 Swift for Tensorflow seems important so I should learn swift
4 To fill out some my practical knowledge of the python datascience ecosystem matplotlib, numpy etc.
5 Unknown how much this will cost as you can also use Google Colab with a GPU/TPU for free!
6 When I found the course I was actually searching for resources to learn about AWS and the first course included a lot of fiddly bits about setting up a GPU on AWS. Which I loved

Published 2 Jul 2019

learning how to learn
Patrick McCaffrey on Twitter