# Data Science
__GOAL:__ what are pros / cons of doing data science. What are good
programs to study it.
__SUMMARY:__
- I give background motivation and a (very _opinionated_) short
summary of a number of programs. (Bullet points except for listing
the syllabus)
- Galvanize appears to be the best one out of the ones you listed
- Depending on your self-motivation level, I think doing a
self-directed project on something you are interested in is the
quickest way to learn and come up to speed. I'd be happy to assist
you and get you the breadth and depth that exceeds what you would
get from a boot camp in a shorter amount of time. This might not
address social networking and job placement. But depending on what
you want to do, I might be able to make some key introductions.
PROGRAM | PRO | CON
-- | -- | --
Galvanize | syllabus makes sense, focus on real projects, job placement | instructors not impressive, "pair programming", "cookie cutter"
General Assembly | hands on, focus on jobs, "critical thinking" | unimpressive instructors teleconference in, too shallow and cheesy
Metis | application oriented. instructors maybe are better... If they are the ones who teach. | buzzwordy and scattered.
Data Camp | start for free, anywhere, anytime, short, hands-on | too low level "load data and manipulate data in Pandas"
================================
================================
# When Is Something Worth Doing?
Something is worth doing if it opens the largest number of good
opportunities for you without limiting or negatively impacting you.
Data science to me means how to use information to solve
problems. This is the most fundamental tool and can be applied
_anywhere_. You don't necessarily limit yourself which might happen if
you studied math, physics, medicine, business, even computer science.
================================
# Data Science Pros and Cons
- PRO: one of the most fundamental skills: techniques to solve
problems using information.
- PRO: in high demand. People want to hire people to load up their
data and give "magical" insights into it.
- CON: A buzzword field. Not as "rigourous" as a degree in math,
physics, computer science.
- CON: Attracts "plug-and-chuggers". People who don't know what they
are doing but know how to operate the tools. Anyone who can do
rigourous scientific thinking and break down complex problems and
solve them will clearly stand out over these simple tool users.
================================
# What Should a Data Science Course Give You?
A data science course should give you:
Key | Value
-- | --
__Theory__ | Understanding how and why things work. Guiding principles when tackling complicated problems
__Tools__ | A toolchest of things to use. Maybe not a complete understanding of them but the introduction so you know they exist and where they might be applied. This will include math (linear regression), compute (python,R), and infrastructure (bash,databases) tools.
__Application__ | Solve a real, interesting problem. Don't just introduce the theory and tools but use them to solve something.
__Networking__ | By interacting with the instructor and other students, you can gain contacts.
__Key to Jobs__ | If hiring managers value the program or recruit students, then this can be a key to a job.
================================
================================
# Evaluation of Programs
I will look at some data science programs and try to summarize what I
think they offer. I'll google "< program > data science" and see what
I find on the first page.
- Galvanize
- General Assembly
- Metris
- Data Camp
I won't try to find a list of the "best" programs either academic or
boot-camp. I won't try to formulate the "perfect" data science program
that offers everything I think is important. I can do it, but will
focus on the few above for now.
================================
# Program: Galvanize
https://www.galvanize.com/data-science
- webpage a bit too "slick" / cool
- seems to focus on jobs
- "solve today's business problems" I guess that's the focus...
- 13 weeks + 12 weeks pre. $18k
- "get started with prep for free" ?
- "our instructors" not that impressive
```
SYLLABUS:
QUARTER 1 Data Science Fundamentals: Python and Statistics. Students
jump right into a Python-based curriculum where we explore and learn
statistical analysis, including frequentist and Bayesian methods. By
following software-engineering best practices and pair programming
with fellow students from different backgrounds, students master
concepts fundamental to data science while growing in skill with
libraries like numpy, scipy, and pandas.
QUARTER 2 Machine Learning & Prediction In the second quarter, we dive
into machine learning, working on real problems in classification,
regression, and clustering using structured and unstructured data
sets. We build a conceptual understanding of each model before
practicing with libraries used in the industry.
QUARTER 3 Natural Language Processing & Recommenders. In the third
quarter, we add a variety of special topics to our data-science
knowledge, including natural-language processing, recommender systems,
neural networks, and time-series data. We gain experience with big
data and data in the cloud. By the end of this section, students
should be well versed and ready to work on their own.
QUARTER 4 Capstone Projects & Case Studies. Over the course of our
immersive program students work independently on three data-science
projects unique to their interests or career aspirations. These
"capstone projects" solve real problems using the technical skill set
students have learned throughout the course and demonstrate their
competence and fitness as a professional data scientist. Students also
work on several group case studies throughout the program, coalescing
what they've learned each week with real-world data while practicing
team-based software development.
```
- PRO: syllabus makes sense
- PRO: focus on real projects
- CON: "frequentist and Bayesian methods". who cares, buzzwords that
make you think it's deep
- CON: "pair programming with fellow students". no thank you.
- CON: not sure why emphasis on "natural language and recommenders"
https://www.quora.com/Is-the-Galvanize-Data-Science-Immersive-worth-the-cost
- CON: says good but does not deliver. "cookie cutter", "2 hours of instruction per day", "instructors not good"
================================
# Program: General Assembly
https://generalassemb.ly/education/data-science-immersive
- hands on
- hired
- instructors seem to video teleconferencing in. From washington dc,
denver, saudia arabia (?), Singapore. Not impressive.
```
SYLLABUS:
Git, UNIX, & Relational Databases. Gather, store, and organize your
data using your basic data science toolkit: SQL, Git, and UNIX.
Data Analysis & Python. Perform visual and statistical analysis on
data using Python and its associated libraries and tools.
Machine Learning, Modeling Techniques, & Big Data. Explore the
differences between supervised and unsupervised learning through the
application of various modeling techniques such as classification,
regression, and clustering.
Critical Thinking & Synthesis. Apply your analysis and modeling
skills to real world data problems in fields like finance, marketing,
and public policy.
Visualization, Presentation, & Reporting. Learn to create
reproducible presentations and reports and use data visualisation
tools to present your findings to key stakeholders.
```
- CON: leading with Git makes me think too low level
- CON: "visual .. analysis" seems cheesy
- CON: "explore differences" why not understand differences
- PRO: "critical thinking"
- CON: "fields like finance, marketing, and public policy." __really?__
https://generalassemb.ly/education/data-science
- PRO: makes sense for short course - CON too simple
https://generalassemb.ly/education
- CON: Not deep. "mastering negotiations workshop", "agile and scrum",
"excel", "seo". Offering a course in SEO is in bad taste in my opinion
https://www.quora.com/Is-General-Assemblys-Data-Science-course-worth-the-cost
- CON: This gives some negative feedback that seems genuine.
================================
# Program: Metis
https://www.thisismetis.com/data-science-bootcamps
- 12 week. $17k
- skills and connections.
- real data 5-project portfolio
- job placement is highlighted
```
SYLLABUS:
Week 1: Introduction to the Data Science Toolkit Exploratory Data
Analysis, Bash, Git & GitHub, Python, pandas, matplotlib, Seaborn
Week 2: Linear Regression and Machine Learning Intro Web scraping via
BeautifulSoup and Selenium, regression with statsmodels and
scikit-learn, feature selection overfitting and train/test splits,
probability theory.
Week 3: Linear Regression and Machine Learning Continued
Regularization, hypothesis testing , intro to Bayes Theorem
Week 4: Databases and Introduction to Machine Learning Concepts
Classification and regression algorithms (Knn, logistic regression,
SVM, decision trees, and random forest), SQL concepts, cloud servers
Week 5: More supervised learning algorithms & web tools Naive Bayes,
stochastic gradient descent and intro to Deep Learning, Full stack in
a nutshell: Python Flask, Javascript and D3.js
Week 6: Statistical Fundamentals MLE, GLM, Distributions, Databases (
RESTful APIs, NoSQL databases, MongoDB, pymongo) Natural Language
Processing techniques
Week 7: Unsupervised Machine Learning Various clustering algorithms,
including K-means and DBSCAN, dimension reduction techniques (PCA,
SVD, LDA, NMF)
Week 8: More Deep Learning & Unsupervised Learning Deep Learning via
Keras, Recommender Systems
Week 9: Big Data Hadoop, Hive & Spark, Final project initiated
Week 10-12: Final Project
```
- CON: buzzwordy and scattered.
- PRO: application oriented
- PRO: instructors maybe are better... If they are the ones who teach.
================================
# Program: Data Camp
https://www.datacamp.com/
- PRO: start for free
- PRO: anywhere, anytime
- PRO: short, hands-on
https://www.datacamp.com/tracks/data-scientist-with-python
https://www.datacamp.com/courses/introduction-to-data-science-in-python
- CON: too low level (but short)