3 Data Analysis Beginner Mistakes I Have Made + Lessons Learned
Mistake #1: Reinventing the wheel
When I was working on my first project, I thought that I had to write every line of code from scratch. This is a reasonable assumption for a beginner, but it is mistaken. More likely than not, what you’re trying to do has already been done by someone, and they have shared their code on GitHub for others to use. Better yet, often people publish tutorials outlining their process from start to finish. These are extremely helpful and can be found with a well-worded Google search. My favorite resource for these tutorials is TowardsDataScience on Medium.com. Lesson learned: Before writing code from scratch, Google to see if someone has already shared the best approach (and give proper credit!).
Mistake #2: Working in a code editor instead of Jupyter Notebook
When first learning to code, it is common to work in a code editor such as Atom or Sublime. However, for those working with data, Jupyter Notebook is a widely-used alternative. According to jupyter.org, “Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more.”
In simple terms, Jupyter Notebook is like a code editor with additional helpful functionalities. For example:
- Write and run pieces of code separately in cells, instead of as one big script
- Revert to previous saves with version control
- Add annotations and markups for readability
I wrote much of my first project in Atom before switching to Jupyter Notebook. Therefore, I had to move over all my code and re-run it in Jupyter Notebook. This was a tedious and time-consuming process, especially because of abundant bug fixes and getting blocked from an API. Which leads me to mistake #3. Lesson learned: Work in Jupyter Notebook instead of a code editor.
Mistake #3: Getting blocked from an API
API owners have to pay a small amount each time a user calls or receives data. For this reason and others, many APIs impose rate limits on the number of calls or amount of data a user can have in a period of time. Not all APIs have rate limits, but owners will not hesitate to block excessive users. I admittedly was one of those users while working on my first project and debugging my API call after moving it from Atom to Jupyter Notebook (see mistake #2). I got blocked, so I had to rewrite the code to call a different API. This was a frustrating and time-consuming mistake that could have been prevented. Now I know to write API calls as functions with a parameter specifying the amount of data to request. This makes it possible to test API calls with small data requests. Lesson learned: Test API calls with small data requests by writing calls as functions with a parameter specifying the amount of data requested.