# Data Science __GOAL:__ what are pros / cons of doing data science. What are good programs to study it. __SUMMARY:__ - I give background motivation and a (very _opinionated_) short summary of a number of programs. (Bullet points except for listing the syllabus) - Galvanize appears to be the best one out of the ones you listed - Depending on your self-motivation level, I think doing a self-directed project on something you are interested in is the quickest way to learn and come up to speed. I'd be happy to assist you and get you the breadth and depth that exceeds what you would get from a boot camp in a shorter amount of time. This might not address social networking and job placement. But depending on what you want to do, I might be able to make some key introductions. PROGRAM | PRO | CON -- | -- | -- Galvanize | syllabus makes sense, focus on real projects, job placement | instructors not impressive, "pair programming", "cookie cutter" General Assembly | hands on, focus on jobs, "critical thinking" | unimpressive instructors teleconference in, too shallow and cheesy Metis | application oriented. instructors maybe are better... If they are the ones who teach. | buzzwordy and scattered. Data Camp | start for free, anywhere, anytime, short, hands-on | too low level "load data and manipulate data in Pandas" ================================ ================================ # When Is Something Worth Doing? Something is worth doing if it opens the largest number of good opportunities for you without limiting or negatively impacting you. Data science to me means how to use information to solve problems. This is the most fundamental tool and can be applied _anywhere_. You don't necessarily limit yourself which might happen if you studied math, physics, medicine, business, even computer science. ================================ # Data Science Pros and Cons - PRO: one of the most fundamental skills: techniques to solve problems using information. - PRO: in high demand. People want to hire people to load up their data and give "magical" insights into it. - CON: A buzzword field. Not as "rigourous" as a degree in math, physics, computer science. - CON: Attracts "plug-and-chuggers". People who don't know what they are doing but know how to operate the tools. Anyone who can do rigourous scientific thinking and break down complex problems and solve them will clearly stand out over these simple tool users. ================================ # What Should a Data Science Course Give You? A data science course should give you: Key | Value -- | -- __Theory__ | Understanding how and why things work. Guiding principles when tackling complicated problems __Tools__ | A toolchest of things to use. Maybe not a complete understanding of them but the introduction so you know they exist and where they might be applied. This will include math (linear regression), compute (python,R), and infrastructure (bash,databases) tools. __Application__ | Solve a real, interesting problem. Don't just introduce the theory and tools but use them to solve something. __Networking__ | By interacting with the instructor and other students, you can gain contacts. __Key to Jobs__ | If hiring managers value the program or recruit students, then this can be a key to a job. ================================ ================================ # Evaluation of Programs I will look at some data science programs and try to summarize what I think they offer. I'll google "< program > data science" and see what I find on the first page. - Galvanize - General Assembly - Metris - Data Camp I won't try to find a list of the "best" programs either academic or boot-camp. I won't try to formulate the "perfect" data science program that offers everything I think is important. I can do it, but will focus on the few above for now. ================================ # Program: Galvanize https://www.galvanize.com/data-science - webpage a bit too "slick" / cool - seems to focus on jobs - "solve today's business problems" I guess that's the focus... - 13 weeks + 12 weeks pre. $18k - "get started with prep for free" ? - "our instructors" not that impressive ``` SYLLABUS: QUARTER 1 Data Science Fundamentals: Python and Statistics. Students jump right into a Python-based curriculum where we explore and learn statistical analysis, including frequentist and Bayesian methods. By following software-engineering best practices and pair programming with fellow students from different backgrounds, students master concepts fundamental to data science while growing in skill with libraries like numpy, scipy, and pandas. QUARTER 2 Machine Learning & Prediction In the second quarter, we dive into machine learning, working on real problems in classification, regression, and clustering using structured and unstructured data sets. We build a conceptual understanding of each model before practicing with libraries used in the industry. QUARTER 3 Natural Language Processing & Recommenders. In the third quarter, we add a variety of special topics to our data-science knowledge, including natural-language processing, recommender systems, neural networks, and time-series data. We gain experience with big data and data in the cloud. By the end of this section, students should be well versed and ready to work on their own. QUARTER 4 Capstone Projects & Case Studies. Over the course of our immersive program students work independently on three data-science projects unique to their interests or career aspirations. These "capstone projects" solve real problems using the technical skill set students have learned throughout the course and demonstrate their competence and fitness as a professional data scientist. Students also work on several group case studies throughout the program, coalescing what they've learned each week with real-world data while practicing team-based software development. ``` - PRO: syllabus makes sense - PRO: focus on real projects - CON: "frequentist and Bayesian methods". who cares, buzzwords that make you think it's deep - CON: "pair programming with fellow students". no thank you. - CON: not sure why emphasis on "natural language and recommenders" https://www.quora.com/Is-the-Galvanize-Data-Science-Immersive-worth-the-cost - CON: says good but does not deliver. "cookie cutter", "2 hours of instruction per day", "instructors not good" ================================ # Program: General Assembly https://generalassemb.ly/education/data-science-immersive - hands on - hired - instructors seem to video teleconferencing in. From washington dc, denver, saudia arabia (?), Singapore. Not impressive. ``` SYLLABUS: Git, UNIX, & Relational Databases. Gather, store, and organize your data using your basic data science toolkit: SQL, Git, and UNIX. Data Analysis & Python. Perform visual and statistical analysis on data using Python and its associated libraries and tools. Machine Learning, Modeling Techniques, & Big Data. Explore the differences between supervised and unsupervised learning through the application of various modeling techniques such as classification, regression, and clustering. Critical Thinking & Synthesis. Apply your analysis and modeling skills to real world data problems in fields like finance, marketing, and public policy. Visualization, Presentation, & Reporting. Learn to create reproducible presentations and reports and use data visualisation tools to present your findings to key stakeholders. ``` - CON: leading with Git makes me think too low level - CON: "visual .. analysis" seems cheesy - CON: "explore differences" why not understand differences - PRO: "critical thinking" - CON: "fields like finance, marketing, and public policy." __really?__ https://generalassemb.ly/education/data-science - PRO: makes sense for short course - CON too simple https://generalassemb.ly/education - CON: Not deep. "mastering negotiations workshop", "agile and scrum", "excel", "seo". Offering a course in SEO is in bad taste in my opinion https://www.quora.com/Is-General-Assemblys-Data-Science-course-worth-the-cost - CON: This gives some negative feedback that seems genuine. ================================ # Program: Metis https://www.thisismetis.com/data-science-bootcamps - 12 week. $17k - skills and connections. - real data 5-project portfolio - job placement is highlighted ``` SYLLABUS: Week 1: Introduction to the Data Science Toolkit Exploratory Data Analysis, Bash, Git & GitHub, Python, pandas, matplotlib, Seaborn Week 2: Linear Regression and Machine Learning Intro Web scraping via BeautifulSoup and Selenium, regression with statsmodels and scikit-learn, feature selection overfitting and train/test splits, probability theory. Week 3: Linear Regression and Machine Learning Continued Regularization, hypothesis testing , intro to Bayes Theorem Week 4: Databases and Introduction to Machine Learning Concepts Classification and regression algorithms (Knn, logistic regression, SVM, decision trees, and random forest), SQL concepts, cloud servers Week 5: More supervised learning algorithms & web tools Naive Bayes, stochastic gradient descent and intro to Deep Learning, Full stack in a nutshell: Python Flask, Javascript and D3.js Week 6: Statistical Fundamentals MLE, GLM, Distributions, Databases ( RESTful APIs, NoSQL databases, MongoDB, pymongo) Natural Language Processing techniques Week 7: Unsupervised Machine Learning Various clustering algorithms, including K-means and DBSCAN, dimension reduction techniques (PCA, SVD, LDA, NMF) Week 8: More Deep Learning & Unsupervised Learning Deep Learning via Keras, Recommender Systems Week 9: Big Data Hadoop, Hive & Spark, Final project initiated Week 10-12: Final Project ``` - CON: buzzwordy and scattered. - PRO: application oriented - PRO: instructors maybe are better... If they are the ones who teach. ================================ # Program: Data Camp https://www.datacamp.com/ - PRO: start for free - PRO: anywhere, anytime - PRO: short, hands-on https://www.datacamp.com/tracks/data-scientist-with-python https://www.datacamp.com/courses/introduction-to-data-science-in-python - CON: too low level (but short)