
- Data Science | All Courses
- PGP in Data Science and Business Analytics Program from Maryland
- M.Sc in Data Science – University of Arizona
- M.Sc in Data Science – LJMU & IIIT Bangalore
- Executive PGP in Data Science – IIIT Bangalore
- Learn Python Programming – Coding Bootcamp Online
- ACP in Data Science – IIIT Bangalore
- PCP in Data Science – IIM Kozhikode
- Advanced Program in Data Science Certification Training from IIIT-B
- PMP Certification Training | PMP Online Course
- CSM Course | Scrum Master Certification Training
- PCP in HRM and Analytics – IIM Kozhikode
- Product Management Certification – Duke CE
- PGP in Management – IMT Ghaziabad
- Software Engineering | All Courses
- M.Sc in CS – LJMU & IIIT Bangalore
- Executive PGP in Software Development
- Full Stack Development Certificate Program from Purdue University
- Blockchain Certification Program from Purdue University
- Cloud Native Backend Development Program from Purdue University
- Cybersecurity Certificate Program from Purdue University
- MBA & DBA | All Courses
- Master of Business Administration – IMT & LBS
- Executive MBA SSBM
- Global Doctor of Business Administration
- Global MBA from Deakin Business School
- Machine Learning | All Courses
- M.Sc in Machine Learning & AI – LJMU & IIITB
- Certificate in ML and Cloud – IIT Madras
- Executive PGP in Machine Learning & AI – IIITB
- ACP in ML & Deep Learning – IIIT Bangalore
- ACP in Machine Learning & NLP – IIIT Bangalore
- M.Sc in Machine Learning & AI – LJMU & IIT M
- Digital Marketing | All Courses
- ACP in Customer Centricity
- Digital Marketing & Communication – MICA
- Business Analytics | All Courses
- Business Analytics Certification Program
- Artificial Intelligences US
- Blockchain Technology US
- Business Analytics US
- Data Science US
- Digital Marketing US
- Management US
- Product Management US
- Software Development US
- Executive Programme in Data Science – IIITB
- Master Degree in Data Science – IIITB & IU Germany
- ACP in Cloud Computing
- ACP in DevOp
- ACP in Cyber Security
- ACP in Big Data
- ACP in Blockchain Technology
- Master in Cyber Security – IIITB & IU Germany

13 Exciting Data Science Project Ideas & Topics for Beginners [2023]

Rohit Sharma is the Program Director for the UpGrad-IIIT Bangalore, PG Diploma Data Analytics Program.
Table of Contents
In this Article, you will learn about 13 exciting data science project ideas & topics for beginners.
1. Beginner Level | Data Science Project Ideas
- Fake News Detection
- Human Action Recognition
- Forest Fire Prediction
- Road Lane Line Detection
2 . Data Science Projects Ideas |Intermediate Level
- Recognition of Speech Emotion
- Gender and Age Detection with Data Science
- Driver Drowsiness Detection in Python
- Handwritten Digit & Character Recognition Project
3. Advance Level Data Science Projects Ideas
- Credit Card Fraud Detection Project
- Customer Segmentations
- Traffic Signs Recognition
4. Top Data Analytics Projects
- Web Scraping
- Data Cleaning
- Exploratory Data Analysis
- Sentiment Analysis
Read more to know each in detail.
An Expression on Data Science Project Ideas
Data Science is continuously thriving as a great career option for this generation. It is among the most promising & happening choices altogether. The market is boosting up with more demands for Data Scientists. It has been reported recently that the demand will increase further to many folds in the coming years. So, if you are a data science beginner, the best thing you can do is work on some real-time data science project ideas.
You can also check out our free courses offered by upGrad under Data Science.
So, if you are an aspiring Data Scientist, it is highly recommended to practice skills to become an efficient professional for this field. After grabbing some very good theoretical knowledge on Data Science, if you are really looking ahead to explore what it seems like to be a professional, then now is the time to do some practical projects.
You must do some of the technical & real-time Data Science projects so that it helps you boost your career growth. The more you practice with Data Science projects , we assure you that you can keep up the pace towards becoming a sound Data Scientist professional.
Check out our Python Bootcamp created for working professionals.
Therefore, if you do some live Data Science Projects , it will enhance your knowledge, technical skills, and overall confidence. But most importantly, if you showcase even a few Data Science projects in your resume, then getting a good job is much easier for you. Why so? Because then the interviewer will know that you are really serious about a Data Science career.
Your real-time experience on Live Data Science Projects will let you hold a strong grip on Data Science trends & technologies. So, layout your hands on real-time Data Science projects & you will know how beneficial it will be for your speedy career growth. After all these discussions, we know that finding that perfect Data Science Project idea for your Data Science project concerns you even more than its actual implementation.
Our learners also read : Python online course free !
In this Data Science blog, we have listed out the names of a few Data Science Project ideas . And to answer your question – ‘What kind of Data Science project is good to start with?’, we have compiled a few good Data Science Project ideas for you to choose from.
The article also includes some of the best data science projects for beginners, that you can check out.
No Coding Experience Required. 360° Career support. PG Diploma in Machine Learning & AI from IIIT-B and upGrad.
Why Should You Learn Data Science?
Before going further into the different data science project ideas that are available, let’s take a look at some of the reasons why data science projects are considered to be so important in today’s world.
1. Data is the new driving force behind industries
Needless to say, in today’s technology-driven world, large enterprises across different industries rely heavily on data for everything, starting with their business growth to expansion. Thus, it wouldn’t be too wrong to say that data is the electricity that powers all the industries of today.
Industries make use of data to improve their performance, generate revenue, and provide better customer service. Infact, the automobile industry, too, is harnessing the power of data to improve the safety of their vehicles. Their goal is to create powerful machines that think in the form of data.
2. Demand And Supply
Although there is a huge abundance of data, there are not enough resources available that can convert this data into powerful products. This basically means that there is still a huge dent in the data scientists, because of a lack of data literacy in the market.
3. High Paying Job Opportunities
Currently, data science is considered a highly lucrative career. Infact, according to some researchers, a data scientist makes 63% more than the national average salary. Apart from this, data scientists also get to enjoy a position of prestige in the company. This is because companies rely heavily on data scientists to make data-driven decisions and guide the organization in the right direction.
4. Data Science is the next big thing
As more and more industries are becoming data-driven, there is a constant need for data scientists. The field of technology is becoming more dynamic and new innovations are being made every day. Thus, data science is the career of the future.
Here are 50 Data Science Project ideas for you, and in the blog ahead, we are discussing a few of these projects in detail. So let’s begin!
- Analyzing the impact of climate change on global food supply
- Weather Prediction
- Keyword generation for google ads
- Wine Quality Analysis
- Stock Market Prediction
- Video Classification
- Medical Report Generation using CT Scans
- Email Classification
- Uber Data Analysis
- Sound Classification
- Credit Card Fraud Detection
- Sign Language Recognition
- Class of Flower Prediction
- Colour Detection
- Loan Prediction
- Road Traffic Prediction
- Income Classification
- Speech Emotion Recognition
- Celebrity Voice Prediction
- Store Sales Prediction
- Detecting Parkinson’s Disease
- Air Pollution Prediction
- Age and Gender Detection
- Optimizing Product Price
- IMDB Predictions
- Handwritten Digit Recognition
- Quora Insincere Questions Classification
- Driver Drowsiness Detection
- Web Traffic Time Series Forecasting
- Survival Prediction on the Titanic
- Time Series Modelling
- Image Caption Generator
- Insurance Purchase Prediction
- Crime Analysis
- Customer Segmentation
- Taxi Trip Time Prediction
- Job Recommendation System
- Boston Housing Predictions
- Interest Level in Rental Properties
- Keyword generation for Google Ads
- Breast Cancer Classification
- Employee Computer Access Needs
- Tweets Classification
- Movie Recommendation System
- Product Price Suggestions
Also, check out our business analytics course to widen your horizon.
Latest Data Science Project Ideas
We have segmented all the Data Science Project Ideas as per the learner’s level. Therefore, you will get a list of a few amazing project briefs for beginner, intermediate & advanced Data Science project ideas .
Our learners also read : Free excel courses !
This list of data science project ideas for students is suited for beginners, and those just starting out with Python or Data Science in general. These data science project ideas will get you going with all the practicalities you need to succeed in your career as a data science developer.
Must read : Data structures and algorithms free course !
Further, if you’re looking for data science project ideas for final year , this list should get you going. So, without further ado, let’s jump straight into some data science project ideas that will strengthen your base and allow you to climb up the ladder.
1.1 Climate Change Impacts on the Global Food Supply
The first one to make it to the list of data science projects for beginners is climate change impacts on the global food supply.
Frequent Climate change and irregularities are big challenging environmental issues. These irregularities in climate divisions are drastically affecting the human lives residing on the Earth. This Data Science Project concentrates on how the climate impact will highly affect global food production worldwide and how much quantification will impact climate change.
The main aim of development for this project is to calculate the potentialities on the staple crop productions due to climate change. Through this project, all the implications related to temperatures & precipitation change. It will then be taken into account how much carbon dioxide affects the growth of plants and the uncertainties happening in the climatic conditioning. Hence, this project will largely deal with Data Visualisations. It will also compare the production in various regions at different time zones.
Also, visit upGrad’s Degree Counselling page for all undergraduate and postgraduate programs.
upGrad’s Exclusive Data Science Webinar for you –
How to Build Digital & Data Mindset
1.2 Fake News Detection
You can drive your Data Science career with this amazing Data Science Project idea for beginners – Detection of Fake News using Python language. The act of wrong or misleading journalism on a digital platform or fake news can be detected by this project. Falsifications are spreading out via social media platforms and online channels & digital media to attain any political agenda.
With this data science project idea, you can use Python language to develop a specific model that can precisely detect whether the news is real journalism or false information.. For this, you need to build a ‘TfidfVectorizer’ classifier and then use a ‘PassiveAggressiveClassifier’ to classify the news into either a “Real” and “Fake” segmentations. There will be a dataset of the shape of 7796×4 dimensions and execute all these in the ‘JupyterLab’.
The main idea of this Data Science project is to develop a real-time machine learning model that can correctly detect social media news authenticity. ‘TF’, commonly known as ‘Term Frequency’, is the total number of times any word will appear in a single document. Whereas, ‘IDF’ or ‘Inverse Document Frequency’ is a calculative measure of the value of a word & it is based on the reputational frequency of its occurrence appearing in the various documents.
The theory is on the ‘Common words’, if these common words happen to appear in multiple documents with a high frequency then they are considered as less important words. So, what ‘TFIDFVectorizer’ does is to analyze the collection of these documents and then accordingly create a ‘TF-IDF’ matrix to it.
Along with this, a ‘PassiveAggressive’ classifier will remain ‘passive’ in case the ‘classification outcome’ is correct; but on the other hand, it will change aggressively if the ‘classification outcome’ is incorrect. So, you can create a machine learning model to detect social media news to be genuine or fake news using this Data Science Project idea.
Explore our Popular Data Science Courses
1.3 human action recognition.
This is a Data Science project on the human action recognition model. It will look at the short videos made on human beings where they are performing specific actions. This model tries to do a classification that is based on actions performed. In this Data science project, you need to use a complex neural network. This neural network is then trained on a specific dataset that contains these short videos. Then there is an accelerometer data that is associated with the dataset. The accelerometer data conversion is done first along with a ‘time-sliced’ representation. Thereafter, you have to use the ‘ Keras ’ library so that you can do training, validation, and testing of the network based on these datasets.
1.4 Forest Fire Prediction
One of the alarming & common disasters happening in today’s world is forest fires. These disasters are highly damaging to the ecosystem. To deal with such a disaster, a lot of money on infrastructure & controlling and handling is required. We can build a Data Science project using ‘k-means clustering’- it can identify any forest fires hotspots along with the severity of the fire at that particular spot.
It can be alternatively used for better resource allocation with the faster response time. Hence, using the meteorological data such as those seasons around which these kinds of fires tragedies are more likely to happen and various weather conditions that worsen them may increase these results’ accuracy levels.
1.5 Road Lane Line Detection
Another Data Science project ideas for beginners include a Live Lane-Line Detection Systems built-in Python language. In this project, a human driver receives guidance on lane detections through lines drawn on the road.
Not only this, it further refers to which direction the driver should steer their vehicle. This Data Science Project application is vital for the development of driverless cars. Hence, you can also develop an application with the powerful capability to identify a track line through the input images or via a continuous video frame.
Read: Top 4 Data Analytics Project Ideas: Beginner to Expert Level
2. Data Science Projects Ideas |Intermediate Level
2.1 recognition of speech emotion .

One of the popular Data Science project ideas is recognition of the speech emotion. If you want to learn the usage of different libraries, this project is perfect for you. You must have seen a lot of editor tools that can tell us how our speech emotion is appearing. This program model can be built as a Data Science project.
In this Data Science project, we will use ‘librosa’ that will perform a ‘Speech Emotion Recognition’ for us. The SER process is a trial process that can recognize human emotion. It can also recognise the speech from the affective states. As we use a combination of a tone and a pitch for expressing emotions through our voice.
The Speech Emotion Recognition model is absolutely possible. However, it can be a challenging project to perform as human emotions are very subjective. The annotation of the human audio is also quite challenging. So, here you will use the mfcc, mel & the chroma features. With this, you will also use the dataset known as ‘RAVDESS’ for the emotion recognition process. In this Data Science project, you will also learn how to develop an ‘MLPClassifier’ for this model.
2.2 Gender and Age Detection with Data Science

So, one of the impressive project ideas on Data Science is the ‘Gender and Age Detection with OpenCV’. With this kind of real-time project, you can easily grab your recruiter’s attention in a Data Science interview.
Talking about the project, the ‘Gender and Age Detection’ is a machine learning project based on computer visioning. Through this Data Science Project, you can learn the practical application of CNN i.e, the convolutional neural networks. Down the line, you will also use models that are trained by ‘Tal Hassner’ and ‘Gil Levi’ for ‘Adience’ dataset.
Along with this, you will also use some files such as – .pb, .prototxt, .pbtxt, & .caffemodel files. Heard about these terms? Read about these files? Understand models too? But do you know how to implement them? Well, you can learn it if you opt to develop a Data Science Project on it.
It’s a very practical project as you will create a model that can detect any human being’s age & gender through analyses of single face detection via an image. So, with this gender classification in a man or a woman can be classified. Also, the age can be classified among the ranges of 0-2/ 4-6/ 8- 2/ 15-20/ 25-32/ 38-43/ 48-53/ 60-100.
But due to various factors such as makeup, or brighter dim lighting, or an unusual facial expression, the recognition of the gender and the age from a single source can become challenging. Therefore, in this Data Science project, you will use a classification model instead of a regression model. A lot of practical & technical learning can be grabbed to upscale your technical skills with these kinds of projects. So, take up the challenge & work hard towards it to make an impressive Data Science Resume.
Top Data Science Skills to Learn in 2022
2.3 driver drowsiness detection in python.
An excellent Data Science project idea for intermediate levels is the ‘Keras & OpenCV Drowsiness Detection System’. Driving overnight is not only tough but a risky job too. We have heard of a lot of cases where accidents happen because the driver fell asleep while driving.
Thus, this project can help prevent numerous road accidents that happen due to such cases. This project’s main aim is to recognize whenever the driver may get drowsy & fall asleep while driving. This project uses Python language where you can build a model that can timely detect the sleepy driver behavior and raises an alert alarm through a high beeping alarm.
In this project, you can implement a ‘deep learning model’ & with its use, you can do a classification among images where a human eye is open or close. Not just this, in this model another formula line is to calculate the score.
This score is based on the time period of how long the eyes remain closed. The score is maintained throughout the driving session. If that score increases & crosses a specified threshold, this model will throw workflow automation through which the alarm will start buzzing heavily.
So, with these kinds of Data Science projects implementations, you will learn all the basics of Data Science projects. You will implement it using ‘Keras’ and ‘OpenCV’. So, why are these used? Well, you are using ‘OpenCV’ to detect face & eye movements. Whereas, with ‘Keras’, you can classify the eye’s state whether it is open or close while using techniques of the Deep neural network.
Data Science Advanced Certification, 250+ Hiring Partners, 300+ Hours of Learning, 0% EMI
2.4 Chatbots

Chatbots are increasingly becoming popular these days. So, for a Data Science project, it is a high on-demand requirement by almost all organizations. It is an essential segment of the business nowadays. These days, chatbots are playing a very crucial role in businesses. They are helping business lines to save an enormous amount of time on their human resources. It is used to provide an improved and personalized business service simultaneously.
There are many businesses who are offering services to their customers. To provide customer service on a large scale, it requires a lot of human resources, ample time, and many efforts to handle each customer on time. On the other hand, these chatbots can provide automation for customer interaction services simply by answering a set of frequent questions commonly inquired by the customers.
There are 2 types of chatbots available in today’s time: Domain-specific chatbot and Open-domain chatbot. The domain-specific chatbot is most often used for a particular problem solution. These are customized in a very strategic & smart manner so that they work strategically & effectively in relation to domain specifications. The second one, ‘Open-domain’ chatbots, needs a lot of training materials that are too continuously because, as per the name, it is developed to answer any kind of question.
Technically speaking, the chatbots are trained using the ‘ Deep Learning ’ techniques. They need a dataset with vocabulary listing, lists consisting of a common sentence, an intent which is behind them, and then the appropriate responses. This is one of the trending data science project ideas.
The ‘Recurring Neural Networks’ (The RNN’s) are the common methodologies to train chatbots. These bots contain encoders that can update the states as per the input sentences alongside intent. It then passes the specified state to the Chatbot.
Thereafter, the chatbot uses the decoder to search an appropriate & subsequent response according to inputted words & also besides the intent. With this Data Science project, you can easily learn Python language implementation as the complete project is itself made in Python. You can upscale your Python technical skills to a certain extent.
Learn: How to Make a Chatbot in Python Step By Step
2.5 Handwritten Digit & Character Recognition Project

With this Data Science Project idea on ‘Handwritten Digit & Character Recognition with the help of CNN, you will practically learn Deep Learning concepts. So, if you are a budding Data Scientist or an enthusiast of machine learning then this is the perfect Data Science project idea for you. For this project development, you will use the ‘MNIST dataset’ of hand-written digits. This is a great project to get hands-on experience with Data Science as you will learn amazing ways that are involved in the process of project building.
As discussed, this project is implemented through the ‘ Convolutional Neural Networks ’. After this, for a real-time prediction, you will build a creative graphical- based user interface for drawing digits on the canvas, and thereafter you will build a model that will be used for the prediction of the digits.
The project’s focus is on developing the computer’s ability & to empower the computer system so that it can recognize characters in hand-written formats by humans. It will then evaluate it further to understand it with reasonable accuracy. With this project implementation, you can learn the practical implementation of the ‘Keras’ and also ‘Tkinter’ libraries.
These are some intermediate data science project ideas on which you can work. If you still like to test your knowledge and take on some tough projects
3.1 Credit Card Fraud Detection Project

After implementing easy projects, you can now move to some advanced Data Science project ideas to learn more concepts. One such idea is Credit card Fraud Detection. With this project, you will learn how to use the R with different algorithms such as Decision Tree, Artificial Neural Networks , Logistic Regression, and the Gradient Boosting Classifier.
You can also learn to use the ‘Card Transactions’ datasets to classify the credit card transaction as a fraudulent activity or a genuine transaction. You will also learn to fit all the different types of models along with the plot performance curve for all of them. This is one of the best data science project ideas one can find.
3.2 Customer Segmentations

This is one of the most popular Data Science projects in the field of Data Science. Digital Marketing is an up & advanced way to target an audience for the companies through their online marketing activities for marketing purposes nowadays. So before running a marketing campaign, different customer segmentation is first done.
Customer Segmentation is among very popular applications of indeed unsupervised learning. Hereby, using clustering methods, companies can now easily identify the customers’ various segments for targeting the potential user-base. There are divisions made on customers & groups are formed according to the common characteristics such as gender, interest areas, age, and habits.
Based on these details they can effectively market each customer group. The project uses the ‘K-means clustering’ and you will learn how to perform visualizations on distributions such as gender and age. Customers annual incomes & average score values can also be analysed.
3.3 Traffic Signs Recognition

This project aims to develop a model to achieve high accuracy in self-driving car technologies using CNN techniques. Traffic signs and traffic rules are of utmost importance for every driver and it must be followed to avoid accidents. To follow these rules, the user must understand how the traffic signals appear to be.
It’s a general rule that to obtain a driving license, an individual has to learn all the driving signals. But for autonomous vehicles, there are programs developed such as the ‘Traffic signs recognition’ using CNN, where you can learn how to program a model that can precisely identify various kinds of traffic signals by the input of an image.
There is a dataset called the ‘German Traffic signs recognition benchmark’. It is commonly known as the GTSRB that is used in the development of a Deep Neural Network for recognizing the class of all the traffic signs belonging to which class type. You will also learn practical knowledge of building a GUI for application interaction.
Know more: 10 Exciting Python GUI Projects & Topics For Beginners
Top Data Analytics Projects
Now that you have learned some of the best data science project topics, let’s take a look at some of the top data analytics projects ideas and data science topics that are currently trending in the market.
1. Web Scraping
Knowing how to scrape data not only adds that boost to your portfolio, but also with the help of this, you can actually explore and use data sets that match with your interests, without the need for compiling the same. Various tools like Beautiful Soup or Scrapy are actually available with the help of which you can crawl the web for interesting data.
2. Data Cleaning
One of the most important tasks for every data analyst is cleaning data to make it ready to analyze. Data cleaning, also called data scrubbing is basically ensuring that the data is consistent, by removing any duplicate or incorrect data and managing the holes in the data. This is one of the best data science topics that is boun dto add value to your candidature.
3. Exploratory Data Analysis
To put it simply, data analysis is all about answering questions with data. With the help of EDA, you can explore different questions that you want to ask.
4. Sentiment Analysis
Last but not least is sentiment analysis, which is basically a technique in natural language processing that determines whether the data is neutral, positive, or negative. They are especially useful for public review sites and social media platforms. Furthermore, with the help of sentiment analysis, you can also detect a particular emotion based on the list of words, and their corresponding emotions. This is known as a lexicon.
Read our popular Data Science Articles
Bottom line.
In this article, we have covered top data science project ideas . We started with some beginner projects which you can solve with ease. Once you finish with these simple data science projects, I suggest you go back, learn a few more concepts and then try the intermediate projects.
When you feel confident, you can then tackle the advanced projects. If you wish to improve your data science skills, you need to get your hands on these data science project ideas. Now go ahead and put to test all the knowledge that you’ve gathered through our data science project ideas guide to build your very own data science project!
We wish that you will drastically improve all the skills of Data Science with the project ideas we presented to you here in this blog. But in case you are new to the Data Science field & would love to learn the Data Science & build similar models for the technological advancements, we recommend you to check out the online course on upGrad & IIIT-B’s PG Diploma programs to learn & upskill in the Data Science world with experienced & expert professionals.
With the right set of knowledge, guidance & tools, you can learn any Data Science project. No level is difficult for learners. That’s why all these live projects are a perfect way to enhance one’s skills and fast progress in attaining mastery. At upGrad , we offer 3 Data Science Online Certification:
1. Executive PG Programme in Data Science (12 months)
From IIIT Bangalore
2. Master of Science in Data Science (18 months)
From Liverpool John Moores University
3. Advanced Certificate Programme in Data Science (7 months)
Try these Data science online certifications by upGrad as we are sure that they will help you in your Data Science career path. Therefore, don’t delay! Start your practice now!
How to make a good Data Science project?
The following points should be kept in mind before starting any Data Science project: Choose the programming language that you are comfortable with. However, the language chosen should be one of the in-demand languages such as Python, R, and Scala. Use datasets from trusted sources. You can use Kaggle datasets. Moreover, make sure that the dataset you are using does not contain errors. Find errors or outliers in your dataset and rectify them before training your model. You can use visualization tools to find the errors in your dataset.
Describe the major components that a Data Science project should have?
The following components highlight the most general architecture of a Data Science project: Problem Statement : This is the fundamental component on which the whole project is based. It defines the problem that your model is going to solve and discusses the approach that your project will follow. Dataset : This is a very crucial component for your project and should be chosen carefully. Only large enough datasets from trusted sources should be used for the project. Algorithm : This includes the algorithm you are using to analyze your data and predict the results. Popular algorithmic techniques include Regression Algorithms, Regression Trees, Naive Bayes Algorithm, and Vector Quantization. Training Models : This involves training your model against various inputs and predicting the output. This component decides the accuracy of your project. Using proper training techniques can produce better outcomes.
What are the skills required to be a Data Scientist?
The following are the essential skills and tools any Data Science enthusiast should master: 1. Statistical Skills including Probability 2. Analytical Skills to analyze and test the data. 3. Programming languages such as Python, R, Scala, and JAVA. 4.Data Visualization Tools such as Power BI, Tableau 5. Algorithms including Regression, Decision Trees, Bayes Algorithm 6. Calculus and Algebra. 7. Communication and Presentation Skills 8. Databases such as SQL 9. Cloud Computing to manage the resources Apart from these technical skills, a professional Data Scientist should also have some soft skills to provide value to the company and improve interpersonal relationships. These skills include critical and curious thinking, business orientation, smart communication skills, problem-solving, team management, and creativity.

Prepare for a Career of the Future
Leave a comment, cancel reply.
Your email address will not be published. Required fields are marked *
Our Trending Data Science Courses
- Data Science for Managers from IIM Kozhikode - Duration 8 Months
- Executive PG Program in Data Science from IIIT-B - Duration 12 Months
- Master of Science in Data Science from LJMU - Duration 18 Months
- Executive Post Graduate Program in Data Science and Machine LEarning - Duration 12 Months
- Master of Science in Data Science from University of Arizona - Duration 24 Months
Our Popular Data Science Course

Get Free Consultation
Data science skills to master.
- Data Analysis Courses
- Inferential Statistics Courses
- Hypothesis Testing Courses
- Logistic Regression Courses
- Linear Regression Courses
- Linear Algebra for Analysis Courses
Related Articles

How to Build a Collaborative Data Science Environment?

Top 30 Tableau Interview Questions & Answers in 2023

What is Data warehousing? Type, Definition & Examples
Start your upskilling journey now, please fill in the below details to download the report.
Want to build a career in Data Science?
DataScience
Talk to a career expert
Get your dream data science role with upGrad!
Let the upGrad experts help you transform your career journey and yield the maximum salary output from your data science knowledge
Explore Free Courses

Data Science & Machine Learning
Build your foundation in one of the hottest industry of the 21st century

Build essential technical skills to move forward in your career in these evolving times

Career Planning
Get insights from industry leaders and career counselors and learn how to stay ahead in your career

Master industry-relevant skills that are required to become a leader and drive organizational success

Advance your career in the field of marketing with Industry relevant free courses

Kickstart your career in law by building a solid foundation with these relevant free courses.
Register for a demo course, talk to our counselor to find a best course suitable to your career growth.

Easy Science Projects
- Activities for Kids
- Chemical Laws
- Periodic Table
- Projects & Experiments
- Scientific Method
- Biochemistry
- Physical Chemistry
- Medical Chemistry
- Chemistry In Everyday Life
- Famous Chemists
- Abbreviations & Acronyms
- Weather & Climate
- Ph.D., Biomedical Sciences, University of Tennessee at Knoxville
- B.A., Physics and Mathematics, Hastings College
Find an easy science project that you can do using common household materials. These easy projects are great for fun, home school science education, or for school science lab experiments.
Mentos and Diet Soda Fountain
Alohalika / Getty Images
All you need is a roll of Mentos candies and a bottle of diet soda to make a fountain that shoots soda into the air. This is an outdoor science project that works with any soda, but clean-up is easier if you use a diet drink.
Slime Science Project
:max_bytes(150000):strip_icc():format(webp)/MamiGibbs-5c5c957246e0fb0001ca8619.jpg)
MamiGibbs / Getty Images
There are many different ways to make slime. Choose from a collection of recipes to make slime using materials you have on hand. This science project is easy enough even young kids can make slime.
Easy Invisible Ink Project
:max_bytes(150000):strip_icc():format(webp)/PRG-Estudio-5c5c95cb46e0fb00017dd06a.jpg)
PRG-Estudio / Getty Images
Write a secret message and reveal it using science! There are several easy invisible ink recipes you can try, using corn starch , lemon juice , and baking soda .
Easy Vinegar and Baking Soda Volcano
:max_bytes(150000):strip_icc():format(webp)/EvgeniiAnd-5c5dd34cc9e77c000166207a.jpg)
EvgeniiAnd / Getty Images
The chemical volcano is a popular science project because it is very easy and yields reliable results. The basic ingredients for this type of volcano are baking soda and vinegar, which you probably have in your kitchen.
Lava Lamp Science Project
:max_bytes(150000):strip_icc():format(webp)/fstop123-5c5dd386c9e77c000166207c.jpg)
fstop123 / Getty Images
The type of lava lamp you would buy at the store actually involves some fairly complex chemistry. Fortunately, there is an easy version of this science project that uses non-toxic household ingredients to make a fun and rechargeable lava lamp.
Easy Ivory Soap in the Microwave
:max_bytes(150000):strip_icc():format(webp)/StefanCioata-5c5dd3de46e0fb0001105eed.jpg)
Stefan Cioata / Getty Images
Ivory Soap can be microwaved for an easy science project . This particular soap contains air bubbles that expand when the soap is heated, turning the soap into a foam right before your eyes. The composition of the soap is unchanged, so you can still use it just like bar soap.
Rubber Egg and Chicken Bones Project
:max_bytes(150000):strip_icc():format(webp)/ChrisWhitehead-5c5dd40e46e0fb0001f24e79.jpg)
Chris Whitehead / Getty Images
Vinegar reacts with the calcium compounds found in egg shells and chicken bones so that you can make a rubbery egg or bendable chicken bones. You can bounce the treated egg like a ball. While the project works with both fresh and cooked eggs, be sure you bounce a cooked egg because the yolk of a raw egg stays soft. The project is extremely easy and yields consistent results. It's great for first graders .
Easy Crystal Science Projects
:max_bytes(150000):strip_icc():format(webp)/VudhikulOcharoen-5c5dd443c9e77c0001d92b31.jpg)
Vudhikul Ocharoen / Getty Images
Growing crystals is a fun science project . While some crystals can be hard to grow, there are several you can grow quite easily, such as Easy Alum Crystals , Copper Sulfate Crystals , and Borax Crystal Snowflakes .
Easy No-Cook Smoke Bomb
:max_bytes(150000):strip_icc():format(webp)/JessEscribanoEyeEm-5c5dd47a46e0fb0001f24e7b.jpg)
Jess Escribano / EyeEm / Getty Images
The traditional smoke bomb recipe calls for cooking two chemicals over a stove, but there is a simple version that doesn't require any cooking. Smoke bombs require adult supervision to light, so even though this science project is extremely easy, use some care.
Easy Density Column
:max_bytes(150000):strip_icc():format(webp)/1densitycolumn-58b5aef15f9b586046af9773.jpg)
There are several common household chemicals that may be layered in a glass to form an interesting and attractive density column. The easy way to get success with the layers is to pour the new layer very slowly over the back of the spoon just above the last liquid layer. If your hands are shaky, pouring layers down the side of a sloping glass works, too.
Chemical Color Wheel
:max_bytes(150000):strip_icc():format(webp)/funwithmilkdemo-58b5b0c83df78cdcd8a593cd.jpg)
You can learn about how detergents work by doing the dishes, but this easy project is much more fun! Drops of food coloring in milk are pretty unspectacular, but if you add a bit of detergent you'll get swirling colors.

Bubble "Fingerprints" Project
:max_bytes(150000):strip_icc():format(webp)/bubbleprint7-58b5b1203df78cdcd8a69b1b.jpg)
You can capture the impression of bubbles by coloring them with paint and pressing them onto paper. This science project is educational, plus it produces interesting art.
Water Fireworks
:max_bytes(150000):strip_icc():format(webp)/TayaJohnston-5c5dd50446e0fb00015874dc.jpg)
Taya Johnston / Getty Images
Explore diffusion and miscibility using water, oil and food coloring. There's actually no fire at all in these 'fireworks', but the way the colors spread out in water is reminiscent of the pyrotechnic.
Easy Pepper and Water Project
:max_bytes(150000):strip_icc():format(webp)/peppertrick-58b5b1113df78cdcd8a66fcd.jpg)
Sprinkle pepper onto water, touch it, and nothing happens. Remove your finger (secretly applying a 'magic' ingredient) and try again. The pepper appears to rush away from your finger. This is a fun science project that seems like magic.
Chalk Chromatography Science Project
:max_bytes(150000):strip_icc():format(webp)/chalkchromatography-58b5b10b5f9b586046b58fb6.jpg)
Use chalk and rubbing alcohol to separate out the pigments in food coloring or ink. This is a visually appealing science project that yields quick results.
Easy Glue Recipe
:max_bytes(150000):strip_icc():format(webp)/glue-58b5b1075f9b586046b580fe.jpg)
You can use science to make useful household products. For example, you can make non-toxic glue based on a chemical reaction between milk, vinegar, and baking soda.
Easy Cold Pack Project
:max_bytes(150000):strip_icc():format(webp)/solidcolours-5c5dd57e46e0fb0001f24e7f.jpg)
solidcolours / Getty Images
Make your own cold pack using two kitchen ingredients. This is an easy non-toxic way to study endothermic reactions or to chill a soft drink can if you prefer.
Watch Now: How to Make Silly Putty to Demonstrate Chemical Reactions
:max_bytes(150000):strip_icc():format(webp)/weird-science-2-90695440-581d2f575f9b581c0bae0c0b.jpg)
By clicking “Accept All Cookies”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts.

Nov 2, 2017
My Most Valuable Advice to Budding Data Scientists on Quora and Coursera — Never stop learning!
I have been responding to a lot of requests ranging from:
“How do I get started in the field on Machine Learning, Deep Learning or Artificial Intelligence ”
“How to I advance from the basics that I know today… ”
or even more honest and confronting ones like this from Coursera, where I am a mentor to a few thousands of deep learning experts and enthusiasts.
“100/100 on assignments doesn’t mean you are a proficient…” asked by Noreddine Belhadj Cheikh in @Coursera
All valid questions, but how to move forward?
I have tried to answer these questions in the past 6 months from both my own learning experience as well as from a career perspective.
This is no regular bullshit and touchy-feely advice you read all over the place, this is how I’ve learned stuff.
Starting from the ground floor and racing upwards like a tiger!
FIRST HERE A FEW REQUESTS AND MY ANSWERS
Question 1 : what is the best programming language to learn for a job in 2017, my answer →>> don’t waste time sitting or letting other scare you; get started.
Start with Python and …
- Ignore warnings such as programming is hard and all other BS.
- Pick up your laptop or pc.
- Install python , c++, perl, a simple IDE
- Spend 6–12 hours a day coding.
- Get on github and start following latest & coolest projects
- Get on stack overflow to ask, get and participate.
6–12 months, you’ve learned programming!
NOTE : I got a lot of critique on this above answer on Quora, I do realize that this rather a steep climb for most, but my hope is to get you started and then you can get accustomed to your own pace.
But, do please hurry!
Question 2 (more focused) : How do I start learning artificial intelligence? Is it possible to get research work in the field of A.I? Are there open source projects where I can contribute?
My answer: →>> if you know the basics, go deeper to learn advanced concepts.
Here’s the FSP* [fastest shortest path] I use to learn new things. This should help you.
- Get a account on GitHub and search for popular projects. Goal = take stock and plan to do 1- project a day. Finish it, no matter what.
- Get your laptop / pc and install anaconda and a bunch of latest deeplearning frameworks (tensor flow, mxnet, PyTorch/Torch, keras etc) Goal : install a couple of times & I hope you’ll fail because only then you’d have learned
- Open up accounts at coursera & edx etc but go and see which projects get you moving on the steepest slope to learning. Goal : measure by what you learned new & what you couldn’t understand. If answer true, you’re right on track!
- Book reading : On github there are links to books that teach deep learning intensely. Goal : 4.1 Fast reading — have plenty of cookbooks for reference. 4.2 Goal = deep reading : have and fully comprehend the fundamentals of good AI / deeplearning book [1] for deep reading.
I can guarantee you that within a year you would have experienced a metamorphosis.
*some linear algebra understanding — even high school math is good. Bit of tech / programming concepts will help you move faster.
For latest ai / deep learning tutorials feel free to follow these that I maintain daily! https://github.com/TarrySingh/Artificial-Intelligence-Deep-Learning-Machine-Learning-Tutorials
Question 3 (reality check) : Can I get a machine learning job if I finish Andrew Ng’s Deep Learning Specialization?
My answer: →>> getting job is a combination of loads of other stuff, not just math.
Getting a job — besides your own (hopefully fully loaded) learning track and plan, has also got to do with things like :
- who you know : no matter how great your skills are and how loaded your resume is with latest Udacity, Coursera, etc certifications, you will have to rely on generosity of someone who’ll get you through the gates and into an enterprise to get cracking.
- where you’re living currently : Market is really flooded with “data scientists” and in some regions like San Francisco, Seattle, ‘fill_in_any_top_city_in_nw_hemisphere’ ; it’d be a bit of a situation where supply has egregiously overshot demand. Try moving to where the demand will be instead of where it’s come and gone .
- Learn more skills than just data science : the next logical step in AI/DL/ML revolution will be standardization, modularization and toolkitization. Meaning? It will become more business centric and will be used directly by business owners to build smarter consumer centric services. So, learn additional skills in both tech (systems engineering, os , web programming like node.js, etc etc) & business domains(team lead, data science coe setup, negotiation, presentation skills, cross-functional stakeholder management, etc).
And still I think Andrew’s course is important to follow because it gives you the opportunity to learn from a very passionate person.
Hope this helps.
2. COURSERA
Question: 100/100 on assignments doesn’t mean you are a proficient, my answer: →>> don’t worry and keep going deeper, job will come eventually as well.
Great question and you make a good point!
The assignments are “easy” because you’re guided how to type a few lines of code in the graded block to complete the assignment.
Yes, that’s easy to score.
My advice to all learners/enthusiasts/proficient folks is to do the following:
Dig deeper : Yes, understand how all the data you are getting was preprocessed.
Play with data sets : How you split them into test/dev/Val/train set.
Do python the hard way / think of it like a language : Type indeed all of the code by hand to understand why you’re doing what you’re doing (I started doing that from my first assignment onwards)
Go and look for more treasures : You have to go out there and participate in real, hands on, peer reviewed even, hardcore projects.
Think of your own gem you wish to reveal/explore : Start thinking of your own projects and get yourself on GitHub.
Have tested your gem? Go build your app : here are some real ideas: Pick up one area of expertise such as NLP , go deep into a field , say chatbot to help lonely teenagers.
Doing your certification is the beginning of a journey to a place you want to be at.
Do you know where that is?
If not, find it!”
(some asked for harder programming assignments): More “hardcore” programming problem sets?
My answer: →>> don’t obsess over theoretical victories, find real-world problems to solve..
For coding and solving more difficult problems, just look around on Coursera and elsewhere. You will find enough challenges to solve. You have fastai, Udacity to play with stuff , enough research projects that require further inspection and finally do some awesome work that is awaiting.
Key lies in learning and if certification leads to self-satisfaction of “Yep, I have a cert”, then the purpose is defeated.
If you learned how and why RMSE worked it’s great but if you thought that it actually doesn’t make sense and there’s a better way, then you’re on to something.
If you think ReLU is great and you do it day in, day out. but if you think there’s a better way, like a recent paper about Swish that was recently published, then you may be on to something.
Here’s a summary
- Don’t waste time sitting, get started! Sometimes a direct answer helps in getting you moving, so get started today !
- If you know the basics, go deeper to learn advanced concepts : If you are seeking depth, then there are enough explored depths you can get started with!
- Getting job is a combination of stuff, not just math : Getting a job has a lot to do with network, finding the right employers and fighting / winning over own biases (spoiler-alert: you can fight them but not win them!)
- Don’t worry and keep going deeper, job will come eventually as well : If you’ve gained understanding on linear algebra, linear and logistics regressions and have a certification then move on to the next challenge. climb higher, if it gets steeper, read some more, ask for advice and keep moving.
- Don’t obsess over theoretical victories, find real-world problems to solve : The world is full of problems — created by data (revenge porn — yes, there is research going on to tackle this, teenage depression, digital disenfranchisement) to unsolved problems (medical and mental diseases etc)
More from Tarry Singh
Founder & CEO deepkapha.ai | 2.5 decades in industry tech and data | Entrepreneur, AI/ML/DL/NS Researcher
About Help Terms Privacy
Get the Medium app

Tarry Singh
Text to speech

1000 Projects
Free BTech BE Projects | MTech ME Projects | MCA Projects | MBA Projects
Quora for College Full Stack Academic Project
Wikipedia is a collection of Human Knowledge whereas Quora for college is a collection of Individual Knowledge. Being on Quora for college will make you a better thinker, which will be handy in whatever else you may pursue. It’s a bit like going to school, except way better.
Fortunately, many of you humans have been enlisted to provide both questions and answers as a cover, maintaining the appearance of just another innocuous website. Just as the purpose of humanity is for the creation of the one great individual Quora is the quagmire from which springs ‘The Wonderful Lizard of Awes’.
Quora for college’s project mission is to share and grow the world’s knowledge. Quora for college is a knowledge exchange platform on the college campus, which can be shared by anyone who knows something. Sharing knowledge is a part of wisdom. It will reach the people who have a desire to acquire knowledge.
The main aim of developing the Quora-related website for college students is to quality communication between the students regarding the question and answer-based platforms. This project was developed using the Full-stack.
Our project Quora for College will be divided into different modules
- Login and Sign-Up page
- Add Question
- Reply to Pre-existing question
The login/signup page: There will be a validation form for the login to the website and login details will be stored in the database. And it is mandatory to log in foe entering the website without login/signup you will not able to access the website. We will also provide the way to login through google means Gmail or sign in with Facebook.
Ask Question: Quora for College will provide you a better chance to add a question of your interest and wants public opinion on it. Then Quora for College is best for you.
Reply to the question: Quora for College will provide you with a better chance to showcase your knowledge. It allows you to answer the questions asked by others on subjects in which you’re knowledgeable.
Feed: Quora for College will provide you feed option in which you will be able to see questions related to the different topics.
Mern Stack was used to develop this Academic project. Mern Stack Consists of
Related Projects
- Smart Looter Animation College project
- College Ranking System Java Project
- Bluetooth Projects
- College Help Center JSP Project
- Resorts Management System Full Stack & Bootstrap Project
- Advertising Projects
- College Fest HTML Project Report
- College Website Project in ASP.Net
- ActiTIME Project
- VPN Project
Leave a Reply Cancel reply
Your email address will not be published. Required fields are marked *
I'm interested in using this for my final project, please send code related to this project.
how to download
Tell me the software requirements and wt are the application required to create this app if possible help me or…
How to do this project what are the software requirements plz can u tell this will help for my final…
a good project
- Civil Geotechnical Engineering Projects List
- Biotechnology Projects for B.Sc, M.Sc & M.Tech
- Civil Engineering Construction Management Projects
- Civil Structural Engineering Projects
- Impact of IT on Sales Industry with special reference to Retail, FMCG & E-Commerce Industries
- E-Learning System Web Portal Java Project
- Hackathon the Code Festival Java Project
- Simple Hospital Management System Project in C
- E-Commerce Website for Online Nursery Store Plants & Accessories
- Bookstore Management System PHP MySQL Project
.Net Framework AI Ajax Anaconda Android ANOVA Arduino UNO Asp.Net Bootstrap C#.Net C++ CSS DBMS Django Framework Eclipse Firebase Flask GitHub GPS Module GUI HTML IBM Cloud IBM Watson IOS IoT Java Javascript JDBC jQuery Js JSON JSP ML Mobile apps MongoDB Ms Access MVC MySQL Netbeans IDE Node-Red NODE JS OpenCV Oracle PHP PHPMyAdmin Python Raspberry pi React js Salesforce Servlets Spring SQL SQLite SQL Server Tomcat UI Visual Studio Code WAMP xampp XML
- Submissions
- Artificial Intelligence
- Career Advice
- Computer Vision
- Data Engineering
- Data Science
- Machine Learning
- Programming
- Certificates
- Online Masters
- Cheat Sheets
- Publications
Top Quora Data Science Writers and Their Best Advice, Updated
Get some insight into tips and tricks, the future of the field, career advice, code snippets, and more from the top data science writers on Quora.
This post is based on Most Viewed Writers in Data Science , the 10 writers with the most answer views in the last 30 days, as retrieved on June 29, 2017.
Just so there is no confusion, please note that this post is "authored" by me, but none of the information contained herein -- from the questions to the answers -- has anything to do with me. I simply edited these informative responses together.

1. Håkon Hapnes Strand , Data Scientist - 255,104 views, 173 answers
Excerpt from answer to: What is a "full-stack" data scientist?
I haven’t heard the expression being used really, but here is my take on what it means: Data scientists build predictive models. That’s the core of what they do. In addition, they need to know a little bit of: Data engineering Software engineering Business analysis A full-stack data scientist would be able to seamlessly perform the role of a data engineer, software engineer, business analyst and data scientist. If you needed someone to develop an app, the FSDS could step in and do it. If you needed someone to set up a data warehouse, or to analyze the strategic management processes of a business, the FSDS could do it.
2. Mike West , SQL Server and Machine Learning enthusiast - 127,776 views, 45 answers
Excerpt from answer to: Is Python still relevant in data science given the rise of Scala (+Spark)?
Scala and Spark aren’t Python rivalries they are friends. I’ve been saying this for sometime now. Python is and will be the gold standard for machine learning over the next ten years. The only Python competitor is R and I’ll be honest, in the real world everyone’s using Python. You’ll see a lot of R at the college level but not in the applied space. Python simply has too much of a head start. Big Data is simply about getting any data (almost always unstructured data) into a format that can be modeled. Scala and Spark are just tools you can use to do that on very large data sets. TensorFlow wasn’t written in Scala. Don’t get caught up in one or two articles even if they are written by Andrew Ng. Do you own research.
3. Corrin Lakeland - 117,841 views, 87 answers
Excerpt from answer to: What will data scientists be working on in 5 to 10 years from now?
This brings me to the future. Over the next five years I expect to see lots of companies that are currently claiming to be involved actually trying using it on serious projects. I expect a good chunk of those projects to fail and the whole industry to have generally matured with far more understanding of what works and what doesn’t. Look at the number of GUI tools that support machine learning now. Things like Excel add-ons that automatically cluster data. Give it five years and I expect most people to think only of them when they think about data science. In ten years I think fashion will have well and truly moved on. Data Science will be a skill that is common and expected in other disciplines and specialist data scientists will be looked at a little strangely. You will also have a situation where is is common and normal for the data that is captured by systems to be amenable to data science, as opposed to what is happening now where most data is structured in a way that requires significant manipulation.
4. William Chen , Data Scientist at Quora - 117,834 views, 195 answers
Excerpt from answer to: Why did you choose to work in data science over quantitative finance?
The summary of all of the reasons I’m about to list was that I chose data science since I was more passionate about it . Here are 5 of the more specific reasons that led to my passion for data science. Excitement over a new, emerging, and growing career path - This decision was made sometime 2013 and 2014, when data science was even more new and uncertain than it was today. The idea of entering something where things were still developing and new appealed to me, and still does today. I try not to base my decisions based on hype - so this bullet is more about how the data science field was growing and would have a place for me rather than how it was hot. Familiarity towards data science - This is arguably the weakest reason on the list, but by the time that I had to choose what I would work on full-time, I had already had two data-science-related internships under my belt: one at Etsy (company) and one at Quora (company). I had great experiences at both of those internships, so choosing to work in data science full-time was a happy known quantity for me. Interest in working on a consumer internet product - I’ve had a longtime fascination with consumer internet products and have basically been excited to watch this whole space grow ever since I got access to dial-up. Working in data science was a unique opportunity for me to become a part of the consumer internet world I’ve been so fascinated by. Intrigue of working on a product thats new and upcoming - Consumer internet products were always interesting to me since they live in the land of uncertainty and could potentially become really big (or just fail). The intrigue of working on a product that could potentially become really important and knowing that you had a small role in it was tempting. Commitment towards knowledge-sharing - I’ve always been committed to sharing thoughts and ideas, either through being a teaching fellow for Harvard Stat 110 or writing as much as I can on Quora. Tech in general has a culture of meetups and blog posts and Quora answers and panels and invited talks. The same is not true in the secretive world of quantitative finance.
5. Clayton Bingham , Researcher in Center for Neural Engineering at University of Southern California - 108,512 views, 8 answers
Excerpt from answer to: In Python, how can I save data from a website to CSV using BeautifulSoup?
The lazy way would be to do something like this: Once you have your data in the dataframe you can do whatever parsing/reformatting you want. Or, if you only need this once you can just do that with Excel or something. I hope this helps!
6. Lili Jiang , Quora Data Science Manager - 88,461 views, 8 answers
Excerpt from answer to: As a data scientist, what tips would you have for a younger version of yourself?
First and foremost, is data science what you think it is? 9 out of 10 aspiring data scientists I come across equate machine learning with data science. “Data Science” is a loaded, catch-all term. Machine learning is a part of it, but at many major tech companies, product analytics is also an integral part of the data science team . Product analytics is a hidden gem. It is fun but doesn’t get talked about nearly as much. This includes: A/B test design Design metrics: Let’s take a video platform as an example. What is the best metric to optimize for that best represents user satisfaction? Should it be number of videos watched? Time spent watching videos? Percent of users that come back in a week to watch another video? Investigate why metrics change: Why is there suddenly a spike in activity in this cohort of users? Understand product mechanics: How do button X and feature Y improve the product? Should we redirect page A to B to C or go from A straight to C? Identify trends and offer strategic suggestions: Argue with data that the company should invest in ______ area to stay competitive in the field.
7. Zeeshan Zia , PhD in Computer Vision and Machine Learning - 70,564 views, 24 answers
Excerpt from answer to: Is AI over-hyped in 2017?
Yes and no, depending upon which community you are talking about. If you are talking about the academic research community, its not over-hyped. There have been major breakthroughs in AI over the past couple of years, and the celebration is certainly justified. In my own area of object recognition, we went from ~35% accuracy (mean average precision on Pascal VOC) to above 65% in just 3–4 years. Previously, we were advancing by 1–2% per year, despite object recognition being the hottest area of computer vision with the largest fraction of papers appearing in top conferences every year. Deep learning also made major breakthroughs in reinforcement learning, which is what yielded successes in general Atari game playing, and beat world grand master in Go decades ahead of expectations! It has finally enabled speech recognition to achieve useable levels of accuracy.
8. Jason T Widjaja , Business and analytics geek. Likes his brother. - 60,837 views, 167 answers
Excerpt from answer to: What is the risk of the hype around analytics/data science dying off, leaving lots of unemployed analysts?
Fundamentally, I do not see data science dying off anytime soon. As long as: people want to make better decisions (always), people care about what the future holds (forever), people and companies who do it well benefit (always) data points available continue to increase (forever), the tools and techniques we have continue to improve (you get the idea).. ...Analytics and data science is not going everywhere. Disclaimer: extremely biased sample size of one.
9. Roman Trusov , Master's Information Technology & Data Science, Skolkovo Institute of Science and Technology (2018) - 57,815 views, 139 answers
Excerpt from answer to: How should a data scientist handle versioning, both for pipeline code and models?
To get the best from version control system it’s better to separate them. Keeping code in version control system just like any other code is the only logical way, because if you, as a DS, perform some heavy ETL or if your code makes decisions that can bring/cost a lot of money, there’s no way it’s going around code review. No. Way. For some things that are more typical for data scientists, though, I don’t think that storing Jupyter notebooks in version control is a good practice. You can’t see a decent diff on them, they are not “production code” and in general, when you are finished with something, you want to push at least a “camera-ready” python script. Jupyter notebooks are great for experiments and demonstrations, but outside of these cases there’s always something better.
10. Shweta Doshi , Co-Founder GreyAtom,Data Science Immersive Learning school - 50,866 views, 123 answers
Excerpt from answer to: What is the essential knowledge and skills are required to start working as data scientist?
Essential knowledge you need to get yourself acquainted with falls under 3 categories i.e Programming, Maths and Science. As a data scientist you will be expected to take a business problem and translate it to a data question, create predictive models to answer the question and storytell about the findings. Statisticians that focus on implementing statistical approaches to data, and data managers who focus on running data science teams tend to fall in the data scientist role. Data scientists are the bridge between the programming and implementation of data science, the theory of data science, and the business implications of data.
Related :
- Advice for Aspiring Data Scientists
- Advice to aspiring Data Scientists - your most common questions answered
- Advice for Learning Data Science from Google’s Director of Research
- An Introduction to AI, updated
- Top KDnuggets tweets, Sep 23-29: An Introduction to #AI - updated for 2020;…
- How to land an ML job: Advice from engineers at Meta, Google Brain, and SAP

Get the FREE ebook 'The Great Big Natural Language Processing Primer' and the leading newsletter on AI, Data Science, and Machine Learning, straight to your inbox.
By subscribing you accept KDnuggets Privacy Policy
Top Posts Past 30 Days
Latest news, more recent posts, related posts.
- Top November Stories: Top Python Libraries for Data Science, Data…
- KDnuggets™ News 20:n44, Nov 18: How to Acquire the Most Wanted Data…
- KDnuggets™ News 22:n06, Feb 9: Data Science Programming Languages and…
- A Layman’s Guide to Data Science. Part 3: Data Science Workflow
- KDnuggets™ News 20:n38, Oct 7: 10 Essential Skills You Need to Know…
- Top October Stories: Data Science Minimum: 10 Essential Skills You Need to…
Get The Latest News!
Subscribe To Our Newsletter (Get The Great Big NLP Primer ebook)

Top 5 Must-Read Answers – What does a Data Scientist do on a Daily Basis?
- What does a data scientist do on a day-to-day basis? A popular and must-know question
- We analyze this question from a data scientist’s perspective through the lens of 5 detailed and insightful answers from experienced data scientists
Introduction
I’m a curious person by nature. Whenever I come across a concept I haven’t heard of before, I can’t wait to dig in and find out how it works. This has come in quite handy in my own data science journey.
But before I landed my first break in data science, I was always curious about what data scientists actually did every day. Was I supposed to simply build models all the time? Or was the oft-quoted saying about spending 70-80% of our time cleaning data actually true?
I’m sure you have asked (or at least wondered) about this too. The role of a data scientist might be the “sexiest job of the 21st century”, but what does that entail on a day-to-day basis?

I decided to research this. I wanted to expand my horizons and understand how data scientists look at their role in different domains (such as NLP). This helped me gain a broader understanding of our role and why we should always read different perspectives when it comes to data science.
So, here is a list of the top 5 answers to help you get a sense of what the typical routine of a data scientist is. Prepare to be surprised – building models isn’t the primary (and only) function in a data scientist’s day-to-day tasks!
I also encourage you to take part in a discussion on this question here . This will enrich your current understanding of what a data scientist does and your thoughts will foster a discussion among our community!
Note: I have taken the answers verbatim from Quora and added my thoughts right at the beginning of each answer. This will help you get a good perspective of what the answer covers without diluting the author’s thoughts. Enjoy!
Machine Learning is Very Process Oriented – Mike West
I like this answer because it’s crisp, to-the-point and simple. The author has even designed a flow diagram and explained his thought process in a wonderfully illustrated way. Here is his answer in full:

Machine learning engineers spend a ton of time in the first two pictures (or stages). The fun part is really in the third stage but it’s only a small part of what happens in the real world.
Some key things to keep in mind about data science in the real world:
- Almost all applied machine learning is supervised. That means we build models against structured datasets
- Data wrangling is a large part of what happens in the real world
- When you hear the word supervised, think classification and regression. Most of my models are classification problems
- Model building is about 20% of my work. Yep, that’s it!
- Many small and medium-sized companies don’t use deep learning at all. Why? Because structured data algorithms like XGBoost win every time
- Everything I do is programmatic
- Most real-world data resides in relational databases. It will be your job to craft queries to pull out the data you need
- Big Data is unstructured data. If you have to build your models against big data, then you’ll need to learn another set of skills
- The cloud is here to stay. I use BigQuery for my really large structured data. Most large models can’t be built on your laptop
- Computers are monolingual. They only speak numbers. When you pass data to your model, you are passing a highly structured, well cleansed numerical dataset
A Percentage-wise Breakdown of a Data Scientists’ Day-to-Day Role – Vinita Silaparasetty
I really like the use of visualization by Vinita. The percentage-wise description of each data science task is helpful and insightful. Vinita has also leaned on her experience to explain the step-by-step work a data scientist does. It’s a must-read answer!
Contrary to popular belief, Data Science is not all glamour. The following survey results by CrowdFlower accurately sum up a typical day for a Data Scientist:

There is a lot of backtracking involved. Sometimes you even need to be able to predict what consequences removing/adding a variable might have.
- Collecting Datasets : Data is the lifeline of Data Science, so we spend plenty of time curating it. On rare occasions, some projects might already have plenty of data
- Cleaning & Organizing Data: This is the most time consuming and crucial step in the entire process. It has a great impact on the final results. Usually, after this step, the once large amount of data reduces and so we may need to collect more data for effective training
- Data Mining: It is the practice of examining large pre-existing databases in order to generate new information. Once data is organized and stored in databases, we can finally begin to derive value from it by finding patterns within the data
- Building Training Sets & Test Sets: Once we have a decent amount of data, we need to split it into the training set and the test set. A training set is a set of data used to discover potentially predictive relationships. It contains all the information about the expected output. A test set is a set of data used to assess the strength and utility of a predictive relationship. It contains mixed variables
- Refining Algorithms: We start with a skeletal algorithm. It is very basic and defines roughly what output is expected. After a few sessions, the accuracy, precision, etc. are recorded and the algorithm is refined to maximize its efficiency
Data Scientist Perspective from a Small-Sized Company – Justin Fister
This is a superb answer and one I can relate to. Note that machine learning, the most anticipated aspect of a data scientist’s job, only occupies 5% of the total time! Just like Vinita, he has also explained his tasks in terms of percentage. Here is Justin’s view:
- NLP-related tasks (15%) . It’s no surprise that PaperRater’s automated proofreading technology requires heavy use of parsers, taggers, regular expressions, and other NLP goodies as part of the core algorithms and feedback modules
- Machine learning (5%) . This tends to be the most enjoyable part. Data cleaning, feature extraction/engineering/selection, and model building
- Reporting and analytics (10%) . Running queries, reviewing analytics, and assisting with strategic decision making
- Data management (5%) . Setting up and managing database servers including MySQL, Redis, and MongoDB. Larger projects may require Hadoop or Spark
- General software development (40%) . Many data scientists’ have a background in computer science, so expect to pitch in if you have an applicable background. API integration, web-development, and wherever else I can add value. Even at an AI startup, most of the development is not going to involve AI
- Other (25%) . This includes a wide variety of tasks, including blog posts, marketing, management, technical documentation, technical support, website copy, emails, meetings, etc.
The “Data Scientist” is a bit of a Myth – Tim Kiely
The author, Tim Kiely, uses a Venn diagram to explain what data science is. Just take a look at this Venn diagram below – it will blow your mind. Tim additionally talks about what data scientists are supposed to be by taking a somewhat contradictory view of the general definition. Here is Tim’s answer:
The “Data Scientist” is a bit of a myth, in my opinion. Not to say they aren’t out there but they are far rarer than is popularly understood and are more of the exception than the rule.
I liken it to the “Web Master” title of the dot-com bubble – these supposed people who could do full stack programming, front end development, marketing, everything. All of those roles/skills were always specialized and remain so today.
“Data Scientists” are supposed to be database architects, understand distributed computing, have a deep understanding of statistics AND some area of business or field expertise. That’s asking a lot when any one of those skill sets can take a career to build.

The Data Scientists I’ve worked with typically have a Ph.D. in A.I. or Machine learning and are effective communicators, which gives them the ability to direct the analysts, DevOps people, programmers and DBA’s at their disposal to solve problems with data-driven solutions. They outline the desired solution and leave it to their teams to fill in the gaps.
Machine Learning Engineer Working on NLP Tasks – Evan Pete Walsh
Let’s drill down into a particular specialization of machine learning. One of my favorites – Natural Language Processing (NLP) ! I wanted to bring out a machine learning engineer’s view here (a role every data scientist should become familiar with). Check out Evan’s full response:
Currently working on NLP, for the most part, including intent classification and entity extraction. Here’s a typical day for me:
- Get to work, pull up GitHub and check on the ZenHub board (kind of like Jira, except way cooler). I had some models that were training last night on our servers and I should have gotten an email that they finished. I did!
- I’ll probably spend a few minutes testing those new models and then tweak some parameters, then restart the training process
- The rest of the day I’m usually head-down coding, either working on a back-end Python application that will supply the AI for one of our products, or implementing a new algorithm that I want to try out
- For example, recently I read a paper on coupled simulated annealing (CSA), and I wanted to try it out on tuning the parameters for XGBoost as an alternative to a grid search. CSA is a generalized form of simulated annealing (SA), which is an algorithm for optimizing a function that doesn’t use any information on the derivative of the function
- Unfortunately, I couldn’t find an implementation in Python, so I decided to write my own. Two days later, I had submitted my first package to PyPI!
The data scientist role is truly multi-faceted, isn’t it? A LOT of aspiring data scientists assume that they will primarily be building models all day long but that simply isn’t the case.
There are all sorts of tasks involved in a typical data science project which you’ll find yourself working on day-to-day. I quite like that because it opens up avenues to learn new concepts and apply them in the real world.
I’ll be posting some more career-related articles on Analytics Vidhya, so stay tuned and keep learning!

About the Author
Shubham singh, our top authors.

Download Analytics Vidhya App for the Latest blog/Article
4 thoughts on " top 5 must-read answers – what does a data scientist do on a daily basis ".
Rutvij Bhutaiya says: June 28, 2019 at 6:04 pm
Shubham Singh says: June 29, 2019 at 1:06 am
Jyoti Kulkarni says: July 02, 2019 at 11:38 am
Rutvij B says: August 22, 2019 at 10:58 pm
Leave a reply your email address will not be published. required fields are marked *.
Notify me of follow-up comments by email.
Notify me of new posts by email.
Top Resources

30 Best Data Science Books to Read in 2023

How to Read and Write With CSV Files in Python:..

Understand Random Forest Algorithms With Examples (Updated 2023)

Feature Selection Techniques in Machine Learning (Updated 2023)
Welcome to India's Largest Data Science Community
Back welcome back :), don't have an account yet register here, back start your journey here, already have an account login here.
A verification link has been sent to your email id
If you have not recieved the link please goto Sign Up page again
back Please enter the OTP that is sent to your registered email id
Back please enter the otp that is sent to your email id, back please enter your registered email id.
This email id is not registered with us. Please enter your registered email id.
back Please enter the OTP that is sent your registered email id
Please create the new password here, privacy overview.

JavaScript in Plain English

Nov 27, 2020
My Computer Science Answer Went Viral On Quora
What are some tips for first-year (freshman year) computer science students.
Three years ago, I used to write very actively on Quora. I had just graduated from University and started working at a job that let’s say was not my dream job. I reflected on four years of journey of my graduation and realized things that I could have done differently and which could have changed the course of my career (which, by the way, was able to change later). I had my experience and my new found wisdom that I wanted to share with other people who were starting their graduation. By that time, I had already answered a few random questions and, at the same time, I was reading a lot on Quora. So, one day I was feeding myself with my daily dose of information on Quora and landed on this question.
What are some tips for first-year BTech CSE students?
I immediately started answering the question and poured my whole wisdom in the answer. I provided tips just like an elder brother would share with his younger sibling or, just like one senior would give it to his junior. I knew I was giving the best knowledge I had at that time and ended up spending more than an hour on it. After making sure, I wrote everything I wanted to write, I pressed the “Submit” button and the next thing I know, the answer was read by five thousand people in a single night. I was surprised to see that I was able to connect with so many people. The response has about 17 thousand views, which I know may not sound like a huge number, but I wasn’t even expecting that because earlier my answers used to have a few hundred views at the maximum. So, now that I have picked back up my long lost passion for writing again, I decided to share that answer here and will also try to curry the additional three years of learning at the end.
Here is the answer that I wrote:
Computer Science is a great branch and is quite interesting too if one is passionate about it. You can be a pro in a year or two if you work right from the first year and every day. So, here are a few tips which you can follow to excel in your career: Coding is what makes you different from others. You should read, write, learn, practice coding. Try to find out a new solution in your way. There always exist more than one solution to a problem. There are a number of coding websites available e.g. leetcode, codechef where you can practice and learn a lot. The good thing with CSE is that you can be at par or even ahead of those already in the industry if you work hard but in other branches until or unless you get into the industry you don’t get much practical knowledge. There are a number of online learning sites like COURSERA, EDX and lot of others where you can do an online course on almost every aspect of CSE. Browse those websites and find out what interests you and enrol in those courses and religiously attend courses on time. The problem is it is easy to enrol but difficult to be consistent. So be regular while you attend those courses. Try to find internships immediately after the second year to get practical exposure. Participate in coding events. Don’t resist yourself that you are a starter and don’t know enough. At least participate and you will realize many like you who have the same skill level as you. To motivate yourself to work more join freelancing websites and work on projects and earn. You can search on internet or quora to find more about freelancing. Start your blog to implement what you have learnt. Last but not the least maintain your GPA to average at least.
When I looked at these points again, I realised while a lot of them are still valid today, these tips are more focussed to prepare one for the job market. So, I will try to add perspective to not only prepare yourself for the industry but also to keep things in mind that can shape your career in the long run.
Understand the different career paths
Graduation in computer science comes with an overwhelming number of career options. It is essential to understand different career paths to make the best out of your time. The best way would be to research the different paths one can choose and try experimenting with different options to find out what interests you. The more involved you are in hackathons, conferences, projects and courses, the more you are going to discover what suits you.
Get involved with the tech community
The tech community is very active and continuously contributing towards lifting the fellow developers. To make the best out of college, try to get involved in the community as early as possible. You can get involved in many ways like helping others, writing articles, contributing in open source projects, answering questions on stack overflow. The more involved you are, the more you learn.
Stay up to date with the tech industry
The software industry is ever-changing and growing so much so that learning should never stop. There are newer and better technologies coming every day and, it is essential to stay up to date with them if you want to excel in this field. I try to stay updated by reading articles on Medium and Dev.io . You can read tech blogs from different tech companies to understand what are the challenges they face and how they tackle them.
I hope this article will help you get motivated to learn and make the best out of time. Also, college time is the best time, so try to enjoy as much as you can.
Happy learning!
More from JavaScript in Plain English
New JavaScript and Web Development content every day. Follow to join 3M+ monthly readers.
About Help Terms Privacy
Get the Medium app

Gurdip Singh
A software engineer, an avid reader with a keen interest in technical writing
Text to speech

Towards Data Science

Jun 8, 2017
Identifying duplicate questions on Quora | Top 12% on Kaggle!
If you are a regular Quoran like me, you have most likely stumbled on duplicate questions asking the same essential question.
This is a bad user experience for both writers and seekers, as the answers get fragmented across different versions of the same question. In fact, this is a problem which is very evident on other social Q&A platforms like StackOverflow. A real-world problem out there, begging AI to solve it :)
For the past month or so i’ve spent my nights cracking my skull open at this problem, along with three thousand other kagglers! It was my second serious competition in a row, it was stressful at times, but i learnt a lot overall — Landed a Top 12% position, a discussion gold medal and a couple of kernel bronze medals :)
A bout the problem — Quora has given an (almost) real-world dataset of question pairs, with the label of is_duplicate along with every question pair. The objective was to minimize the logloss of predictions on duplicacy in the testing dataset. There were around 400K question pairs in the training set while the testing set contained around 2.5 million pairs. Yeah, 2.5 million! A large majority of those pairs were computer-generated questions to prevent cheating, but 2 and a half million, god! I was maxing out my poor 8GB machine every other hour :(
My approach — I started off with the xgboost starter by @anokas, and built upon it gradually. My feature-set involved around 70 features which is on a rather lower range, when compared to the Top Kagglers’ approaches. My features could be broadly classified into NLP-based features, word `embedding based distances and graph-based features. Let me elaborate:
N LP Features : Some very obvious text-based features are percentage of words matching between the two questions, length of the two questions, number of words, number of sentences, number of stopwords, the usual natural language stuff! I tried going ahead with tfidf scores, but that was both not very useful and computationally expensive. Instead, one of the most important features turned out to be weighted word match share, where each word’s weight is the inverse frequency of the word in the corpus(basically idf) — if i have a rare word in both the questions, they might be discussing similar topics. I also had features like cosine distance, jaccard distance, jarowinkler distance, hamming distance, and n-gram matches(shingling).
A library i used extensively for NLP tasks was Spacy, which has been developing some great functionalities lately — spacy similarity turned out to be a good feature too. Some creative ones i thought of were related to the kind of question — whether it is a “How” question or “Why” question — dependent on the first word of the sentence. Strangely, when modelling this i should’ve thought of building “last word similarity” as well, completely missed that! Named entities are a crucial key to understand the context of a question — thus common_named_entity score and common_noun_chunk score were an obvious choice. I pushed out a kernel about calculating similarity via the wordnet corpus, however wordnet fell short of spacy in terms of ease of use, speed and vocabulary size.
Word Embeddings based Distances :
When doing a NLP competition, can Word2Vec be left behind! I feel like word2vec might be the coolest computer science concept i’ve read, i always get blown away by its effectiveness. Every.Single.Time.
Anyway, i had the questions mapped to 300-dimensional vectors in a Sent2Vec format, resulting in a vector for every question. Naturally, distance-based features between vectors were built — cosine, cityblock, jacard, canberra, euclidean and braycurtis. Must mention @abhishek’s scripts for the inspiration for these features. Unfortunately, i couldn’t build the real embedding layers which i could pass in to the lstm layers in keras — my RAM just wouldn’t let me. I’m getting some serious hardware, soon!
G raph Features : In an NLP competition, these graph features played quite the spoilsport! However, it was a nice reminder of how the theories of social networks might apply in datasets like Quora’s question pairs.
Here, every question is a node in the graph and a question pair in the dataset indicates an edge between the two nodes. We can use the graph structure of the test data as well, since we’re not considering the is_duplicate label anywhere — Those 2 million edges contributed a whole lot to the graph! There were heated discussions between kagglers on whether the graph-based features are supposedly “magic features”, which should be released to level the playing field. Anyway, all of these features gave significant boosts to most models:
- Degree of a node: Essentially, frequency of the question, there’ll be as many edges as many times that question has occurred in the dataset. This feature gave a huge gain, because the questions sampling(upsampling the duplicates) done by Quora was most likely dependent on this frequency.
- Intersection of neighbors: Percentage of first-degree neighbors of the question pair, for eg. Q1 has neighbors Q2,Q3,Q4 and Q2 has neighbors Q1,Q3. The common neighbors for (Q1,Q2) are Q3, one half of all first degree neighbors.
- Degree of Separation: This was a feature i thought of and implemented via breadth-first-search, didn’t give a lot of improvement though.
- PageRank: I implemented this feature, even put out a kernel on kaggle — Higher pageranked questions are linked to important(higher pageranked) questions, while spammy questions are linked to spammy ones.
- kcore/kclique: K-core is basically the largest subgraph where every node is connected to at least “k” nodes, didn’t give me a lot of gain though. I had thought of kclique, but didn’t implement it because of lack of time :( Turned out it was a pretty important one!
- Weighted Graph: Later in the competition, fellow kagglers shared the idea of having a weighted graph, where weight of every node is the weighted_word_share(we’d discussed this earlier). Neighbor intersection in this weighted graph was useful.
Transitivity Magic — Continuing on the graph structure, modelling transitivity between the questions was an obvious approach. For eg. if Q1 is similar to Q2 and Q2 is similar to Q3, it implies Q1 is similar to Q3 more often than not(as per our dataset). This was one of the features i had begun building but left it midway, a costly mistake. Many top solutions used this feature in some form or another, a simpler version was to average probabilities of duplicacy between every question and its counterpart’s neighbors.
My Whiteboarding Sessions
Some fuckups.
When a noob like me dives head-first into kaggling big-time, there are bound to be fuckups. Even though i make a conscious effort to keep my pipeline modular and version-controlled, i lost almost a weekend’s worth of work because of a nasty bug in my pipelining. Learning it the hard way, i git reverted and remodelled the pipeline on an ipython notebook . A major challenge that i faced when building features was writing memory-efficient code and not rebuilding previous features, modularity was key. This led to another fuckup — since i couldn’t handle the whole of my testing dataset in my RAM, i was breaking it into six subsets and building features iteratively. This meant i was doing a pandas concat between the older dataframe with the new feature, little did i concentrate on indexing :( Spent more than a couple days scratching my head at all NaN features. Noob mistake.
The QID Conundrum:
The training dataset had an ID for every question, thus a QID1 and QID2 in every row. However, this wasn’t the case with the testing dataset which meant “QID” couldn’t be used as a feature straightaway. A fellow kaggler released an incredibly creative observation of the decreasing average duplicate ratio(rolling mean) with increasing QID — Mostprobably an indication of Quora’s improving algorithm with time, thus reducing the number of duplicate questions with increasing ID. This inference is based on the assumption that the QID values aren’t masked and are truly representative of time of posting the question. To model QIDs in our testing dataset, i had a hashtable which mapped question texts to QIDs. Now iterating over all questions in test_df — If i encountered an existent question, the corresponding QID was assigned to it. Else we assumed that this was a new question in order of time of posting, incrementing QID by 1. This led to various features like QID difference, mean QID, min QID with the hope of modelling the decrease of duplicacy over time.
Class Imbalance:
A major part of discussions were centered around guesstimating the class split in the testing dataframe, which wasn’t clearly similar to the testing split. The mathy folks figured a narrow range of the split via a couple of constant valued submissions, here and here . There were around 34% positive duplicates in the training set, while the testing set had an estimated 16%-17% positive duplicates — Probably a result of the improved Quora algorithm or a result of the computer-generated question pairs. Anyway, a misrepresentative training dataset wasn’t going to help matters — people came up with oversampling solutions(duplicating the negative rows in training), or rescaled their predictions by an appropriate factor.
My models — XGBoost is love, XGBoost is life
Very embarassed to admit, but my submission was just a single model solution, a 2000-round xgboost. To be fair, i didn’t spend a lot of time on parameter tuning or building diverse models, left it for too late. I tried the default random forest and GBM, not to a better effect than XGB. Stacking and ensembling were in the roadmap too, but that just stayed there :(
Instead, i spent a very long weekend at learning Keras and building dense neural nets — ’twas my first time! Went through understanding various hyperparameters and ideal architectures, the theory behind activation functions, dropout layers and optimizers! Very interesting stuff! I played around with my a two-layer sigmoid neural net with my 70 features tuning it to perfection, it never came close to the xgboost though. As i mentioned earlier, i couldn’t build embedding layers or LSTMs with my hardware, that would’ve definitely helped. Soon.
Proud of my modelling pipeline though, didn’t have to fret over making a function for every model– will be helpful when i build hundreds of models to stack. Soon.
Top Solutions:
Post-competition writeups of the gold medalists just makes me feel like a complete noob! I have to really up my game, and work harder to get up there, the bronze isn’t far away! Some of the major learnings from winners’ solutions:
- Most teams rescaled their final predictions based on the weighted graph’s edges or the frequency of nodes
- Every top team had built a stacker with hundreds of models, with a reusable pipeline for model building.
- State-of-the-art neural net architectures(Siamese/Attention NNs), LightGBMs were used, but even lower performing models like ExtraTrees or Random Forests help!
- Graph features were the core of many solutions, as teams used various techniques to derive value out of it. Some removed spurious(less frequent) nodes, some used mean/median of neighbors’ weights, while most modelled transitivity.
- Apparently, being Question1 or Question2 mattered too. Surprising!
- NLP: process the text in many different ways — lowercase and unchanged, punctuation replaced in different ways, stop words included and excluded, stemmed and not stemmed, etc.
- Stacking is important, but people achieved 0.14x with a single xgb model too.
- First place post . Truly humbling.
Conclusions :
I should manage my time better, assigning appropriate time for exploration, feature engineering, model building and stacking. Really felt the crunch in the last week as i ran short of submissions.
- My modelling pipeline was good for testing, but i need to scale up operations on building more models on different hyperparams/subsets of data.
- Don’t give up on building features midway, some of them turned out to be costly misses.
- Don’t be too bent up on a feature just because you spent time on it. This is kinda funny lol!
- Spend a lot of time on Kaggle Kernels and Discussions. They’re damn cool :)
So yeah, that’s that. One competition ends, another begins. Kaggle is addictive!
This was originally posted on my blog . One Blog Every Week
More from Towards Data Science
Your home for data science. A Medium publication sharing concepts, ideas and codes.
About Help Terms Privacy
Get the Medium app

Shubhankar Srivastava
LEARN | DO | REPEAT
Text to speech

IMAGES
VIDEO
COMMENTS
One of the key purposes of the introduction to a science project is setting forth or outlining the purpose of the project in a clear, concise manner. The introduction summarizes how the science project is to work or proceed from start to fi...
Some easy investigatory science project ideas include attempting to purify used cooking oil, making biodegradable plastic and increasing the shelf life of fruits and vegetables. One easy experiment is to investigate possible strategies for ...
The conclusion in a science project summarizes the results of the experiment and either contradicts or supports the original hypothesis. It is a simple and straightforward answer to the question posed by the experiment. This section is clea...
1. create laboratory artificial space, like universe vacuum space . · 2. develop the device which can use underground earth heat energy and produce electricity
Head of Data Science at Quora (2020-present) ... A Robust and Scalable method to compare Percentile Metrics in online experiments.
Check out the top 13 hand-picked Data Science Project Ideas & Topics for Beginners recommended ... Quora Insincere Questions Classification
All you need is a roll of Mentos candies and a bottle of diet soda to make a fountain that shoots soda into the air. This is an outdoor science
Open up accounts at coursera & edx etc but go and see which projects get you moving on the steepest slope to learning. Goal : measure by what you learned new &
This project was developed using the Full-stack. Modules. Our project Quora for College will be divided into different modules. Login and Sign-
The data science topic page at Quora. ... I expect a good chunk of those projects to fail and the whole industry to have generally matured
Larger projects may require Hadoop or Spark; General software development (40%). Many data scientists' have a background in computer science, so
Machine Learning Engineer. Quora. May 2022 ; Software Engineer. athenahealth. Aug 2021 ; Machine Learning Course Assistant. New York University. Sep 2020 ; Project
To motivate yourself to work more join freelancing websites and work on projects and earn. You can search on internet or quora to find more
About the problem — Quora has given an (almost) real-world dataset of question pairs, ... 3 Data Science Projects That Got Me 12 Interviews.