Skip to main content

Machine Learning: Choosing a Model Basics

 

 Let’s make Data science as easy take away. Having a very little knowledge of python, python libraries, statistics, algebra, jupyter notebook, and other programming tools. Just fall in love with the journey to be started

Table Of Content:

·        Data Science Phases & My Familiarity With Concept

·        Facing Emotions

·        What is Machine Learning?

·        Types of Machine Learning

·        Metrics To Evaluate Machine Learning Algorithms Using Python

·        Choosing An Algorithm for Machine Learning!

·        Cheatsheet: Machine Learning Algorithms (Python & R Code)

·        About the Author & Where to Find ME!

Data Science Phases & My Familiarity With Concept:

·        Define Business Problem (Familiar with)

·        Data Collection (Familiar with)

·        Data Cleaning (Familiar with)

·        Data Analysis (Familiar with)

·        Predictive Analysis (NOW LEARNING)

·        Validating Model (NEXT UP)

                      ·        Deployment (COMING SOON)

Facing Emotions!

It is easy to get overwhelmed if you knew the only way to get to an apartment on the 12th floor was to … take the stairs if the elevator doesn’t work. The idea of being overwhelmed is up to the person. When it comes to a certain task, someone may love it and/or someone else may hate it. Emotions are temporary. Right now, I’m definitely overwhelmed with learning the python coding, terminology, and statistics that is associated to machine learning. WHY am I overwhelmed? I believe it is because it is new and unknown to me in this present moment and by the end of the writing this blog, I’ll be a step closer to becoming a data scientist.

What is Machine Learning?

As you can see machine learning is related to math/statistics and computer science. Simply enough, machine learning is teaching a machine by inputting data, labeled or unlabeled, to predict an outcome & the machine will develop knowledge of the topic over time.


Types Of Machine Learning



Supervised LearningThis algorithm includes a target/outcome variable (dependent variable) which is to be predicted from a given set of predictors (independent variables). Using these set of variables, a function can be generated to map inputs to desired outputs. The training process continues until the model achieves a desired level of accuracy on the training data. Examples of Supervised Learning: Regression, Decision Tree, Random Forest, KNN, Logistic Regression etc.

Unsupervised LearningIn this algorithm, there are no target or outcome variables to predict/estimate. Furthermore, this algorithm is used for clustering population in different groups, which is widely used for segmenting customers in different groups for specific intervention. Examples of Unsupervised Learning: Apriori algorithm, K-means.

Reinforcement Learning: Using this algorithm, the machine is trained to make specific decisions. It works this way: the machine is exposed to an environment where it trains itself continually using trial and error. This machine learns from past experience and tries to capture the best possible knowledge to make accurate business decisions. Example of Reinforcement Learning: Markov Decision Process”

Metrics To Evaluate Machine Learning Algorithms Using Python

Before we move on to look at the process of choosing an algorithm, it is important to note that the goal of our metrics after we create our model is to evaluate whether the model is a “good” model to use compared to other that you can use.

Classification metrics:

Classification Accuracy

Log Loss

Area Under ROC Curve

Confusion Matrix (Method for classification prediction results)

Classification Report (Method for classification prediction results)

Regression Metrics

Mean Absolute Error

Mean Squared Error

Validating Results for Clustering

Internal validation, which revolves around the following metrics: cohesion with each cluster & separation between different clusters

External validation

Choosing An Algorithm for Machine Learning!



Here is how I think about the process:

·        Think of problem & what are you looking to predict (Are you looking to predict a number, classify something, etc.)

·        Import python packages

·        Pick Your 1st Model. Algorithm Types: Supervised Learning, Unsupervised Learning, and Reinforcement Learning

·        Split the data: test & train (Split 1st to avoid data leakage)

·        Scale data (ONLY x values)

·        Cross validate

·        Fit your model (regularization happens here):

·        Predict

·        Check metrics & evaluate (metrics to evaluate the performance differ from each model type)

·        (Optional) Compare your model by running another algorithm under the same machine learning type & run steps above to compare evaluation. You may tune the hyperparameters and repeat the same process till we achieve the desired performance. Your final model selection will depend on optimal evaluation metrics for the chosen model and problem.



Source: Includes Algorithms for Linear Regression, Logistic Regression, Decision Tree, CWM (Support Vector Machine), Naive Bayes, kNN(k-Nearest Neighbors), k-Means, Random Forest, Dimensionality Reduction Algorithms, Gradient Boosting, Gradient Boosting & AdaBoost

 

Comments

Popular posts from this blog

Cloud Computing – Research Issues, Challenges, Architecture, Platforms and Applications: A Survey

    Cloud computing is the development of parallel computing, distributed computing, grid computing and virtualization technologies which define the shape of a new era. Cloud computing is an emerging model of business computing.   I.      Introduction Cloud computing is the development of parallel computing, distributed computing grid computing, and is the combination and evolution of Virtualization, Utility computing, Software-as-a-Service (SaaS),   Infrastructure-as-a-Service   (IaaS)   and Platform-as-a-Service (PaaS) . Cloud is a metaphor to describe web as a space where computing has been pre-installed and exist as a service; data, operating systems, applications, storage and processing power exist on the web ready to be shared. To users, cloud computing is a Pay-per-Use-On-Demand mode that can conveniently access shared IT resources through the Internet. Where the IT resources include network, server, storage, application, service and so on and

Cryptowall 3.0: Back to the Basics

Talos Group - February 9, 2015 This post was authored by Andrea Allievi  & Earl Carter Ransomware continues to impact a large number of organizations and the malware continues to evolve. In January, we examined Cryptowall 2.0 and highlighted new features incorporated into the dropper and Cryptowall binary. When Cryptowall 3.0 appeared, we were interested in seeing what new functionality was incorporated into this latest variant in the Cryptowall series. The latest 3.0 sample that we analyzed was in a zip file. This zip file contains multiple dropper files which are essentially identical in functionality except for the encryption algorithm used to obfuscate the dropper and eventually build the Cryptowall 3.0 binary. Similar to the 2.0 version, the dropper is encrypted with a custom algorithm three times, but that is where the similarities end. In the 3.0 sample that we analyzed, the following dropper features (which we identified as being operational in version 2.0)