Skip to main content

How start Data scientist journey


(Read only if you're interested in getting into Data Science or tag, if you know someone who is)

Everyday, I get inboxes from people on LinkedIn/Fb who are passionate about learning Data Science, but they don't know how to start because they have no prior knowledge on this field. The internet has a lot of useful resources that can be availed to get into this field and I thought I take out the time to share a few based on my little experience in this domain.

The following is one of the lists of steps that you can follow if you want to jump into learning Data Science.

1. The initial and most important thing is to have your core concepts on machine learning to be strong. I believe that just like OOP concepts are the basic pillars for a Software Developer, same goes to ML concepts for a Data Scientist. This course is 11 weeks long, but it will develop your ML base. Therefore, this course from Andrew Ng on ML should be your first step. And as most of you already know, you can get this course for free just by applying for the financial aid.

https://www.coursera.org/learn/machine-learning

2. However, the financial aid will take up to 15 days for verification. Hence, I would suggest utilizing that time in learning the basics of Data Science. What is it? What's all the hype about? What's the difference in machine learning, data science, deep learning, data mining, artificial intelligence, etc? To clear all these concepts, you have this free career path of Data Science on Cognitive Class. These are short 3-5-hour courses, you can finish them up in a week easily. This will even clear up your idea of whether you want to join this field or not.

https://cognitiveclass.ai/courses/data-science-101/
https://cognitiveclass.ai/courses/data-science-hands-open-source-tools-2/
https://cognitiveclass.ai/courses/data-science-methodology-2/

3. Now that you know what the hype is about, and you've established your base in ML, it's time to get some hands-on experience. I would suggest these two courses in R and Python about hand on experience for Data Science. This way you might even pick a track of whether you're more comfortable with R or Python. I suggest these courses from Data Camp because they will enforce you to do exercises after every small topic. So yes, you'll finally get a chance to get your hands dirty over Data Science and the key libraries required in Python and R.

https://www.datacamp.com/courses/introduction-to-r-for-data-science-edx
https://www.datacamp.com/courses/intro-to-python-for-data-science

4. However, I don't believe those intro courses are enough. There are full career paths of Data Scientist with R/Python in the data camp website consisting of 20+ short courses. Unfortunately, they are paid. So, if you're passionate enough, you can spend a few dollars and get them done to have a bit practical experience over this field. I highly recommend this especially for the people who are currently employed and earning.

https://www.datacamp.com/tracks/data-scientist-with-python
https://www.datacamp.com/tracks/data-scientist-with-r

But if you want a free trade off, you can go for this specialization in python for data science from Coursera. But, I believe it to be way too lengthy and too much theoretical.

https://www.coursera.org/specializations/data-science-python

5. Once done, I would recommend exploring Kaggle.com. This site contains tons of datasets and competitions. Analyze people's code and working with these datasets and try to follow the footprints of the experts. See what algorithms are they using when it comes to image recognition, how are they handling specific type of problems? This way you'll learn the basic and mainstream solutions to all the majority use cases. Also, you can try to work with the datasets to assess yourself.

6- Nowadays, it is must to have at least conceptual knowledge about deep learning. And it is even better if you have experienced working with the highly in demand TensorFlow library. To catch up with this, you can do the following two courses. The first one is comparatively easy but the second one would give you good command over tensor flow.

https://cognitiveclass.ai/courses/introduction-deep-learning/
https://cognitiveclass.ai/courses/deep-learning-tensorflow/

At the end, you can definitely count yourself good enough to start apply for positions related to Data Science confidently. I know there is much more to learn in this field, but this would surely give you a head-start and edge over your competitors. Feel free to add if I missed anything and ping me if you have any further queries. Cheers!


Comments

Popular posts from this blog

IP camera access through python

In this tutorial we access IP camera using python. from urllib.request import Request, urlopen import base64 import cv2 import urllib import numpy as np url = 'http://192.168.0.104:8080/shot.jpg' username = '' password = '' while True:     proxy_handler = urllib.request.ProxyHandler({})     opener = urllib.request.build_opener(proxy_handler)     imgResp = Request(url, headers={"User-Agent": "Mozilla/5.0"})     base64string = base64.b64encode(('%s:%s' % (username, password)).encode("utf-8")).decode("utf-8")     imgResp.add_header("Authorization", "Basic %s" % base64string)     r = opener.open(imgResp)     imgNp = np.array(bytearray(r.read()), dtype=np.uint8)     img = cv2.imdecode(imgNp, -1)     cv2.imshow('test', img)     if ord('q') == cv2.waitKey(10):         exit(0)     # all the opencv processing is done here     cv2.imshow('test', img)     if o

Simple linear regression model with scikit-learn

Simple Leaner Regression Model is use to find the relation ship between two variable. It is commonly used in the predict analysis. Suppose we want to know price of pizza on the basis of size. We will train a model on the different size of pizza and its price. Then we will give the size of the pizza to train model it will predict its price. suppose we have different size of pizza x =  [[6], [8], [10], [14], [18]]] and its price y = [[7], [9], [13], [17.5], [18]]. Let's implement this problem it scikit-learn. Firs import Linear Regress from scikit-learn pakage. from sklearn.linear_model import LinearRegression Import Numpy module because when b give the data to model it will only accept if the data in Numpy array. from sklearn.linear_model import LinearRegression import numpy as np Import matplot libraray which use to Draw a plot of our data import matplotlib.pyplot as plt x = [[6], [8], [10], [14], [18]] x = np.reshape(x, (-1, 1)) Her we rehshape array to 2d because it ac

How parse XML file Dataset using python

Parse XML file and Store data in CSV file for machine learning Algorithms. import xml.etree.ElementTree as ET import os import csv path = 'G:\salman' with open('names.csv', 'a') as csvfile:     fieldnames = ['pair_id', 'e1', 'e2', 'Sentance']     writer = csv.DictWriter(csvfile, fieldnames=fieldnames)     for filename in os.listdir(path):         if not filename.endswith('.xml'): continue         fullname = os.path.join(path, filename)         tree = ET.parse(fullname)         lst = tree.findall('sentence')         for i in lst:             i_ = i.findall('pair')             for elem in i_:                 if elem.attrib['ddi'] == 'true':                     writer.writerow({'pair_id': elem.attrib['id'], 'e1': elem.attrib['e1'], 'e2': elem.attrib['e2'], 'Sentance': i.attrib["text"]})