What is data science?
The goal of data science is to extract knowledge and derive meaningful insights from mostly noisy, structured or unstructured data. It uses techniques mostly from statistics and computer science.
At the heart of data science is the data-generating process, i.e., some phenomena or truth in the real world that has observable consequences. Data is derived from these observed consequences and data science is the set of principles and techniques used to reconstruct the truth from the data.
Is data science another name for statistics? Yes and no, but let’s not get into another unproductive debate. Let’s get into learning data science instead.
What is this website for?
The aim of this website is to connect the theoretical and the practical aspects of data science. Most resources on data science covers only the practical part while leaving out the theory behind the practice. Others focus on the theory with no practical applications. The idea is to lay down the theoretical foundations and apply it to actual real world data. This way, the reader will actually have a holistic understanding of how data science is done and have a broad appreciation of the problems and issues data scientists might encounter on their journey.
Prerequisites
Although we will try to build ideas from the ground up, some of the topics will require basic knowledge of calculus and linear algebra. For data analysis, we will be using R, although they can be performed using SPSS, SAS, Python or any other programming languages. For machine learning, we will be using Python but R, C++, Matlab or other programming languages will work as well.