Training an Algorithm takes four ingredients:

  1. Data – feeds the model.
  2. Model –
  3. Objective Function – estimates how correct the model is on average.
  4. Optimization Algorithm – varies the models parameters.

These four steps are repeated over and over.

Types of Machine Learning

The three main types of machine learning are supervised learning, unsupervised learning, and reinforcement learning. Here’s a comparison and contrast of these three types:

  1. Supervised Learning:
    • Definition: Supervised learning involves training a model using labeled examples, where the input data is accompanied by the corresponding target values.
    • Two Types of Supervised Learning:
      • Classification – outputs are categories.
      • Regression – outputs will be numerical.
    • Training Process: The model learns from a labeled dataset and tries to generalize patterns and relationships between input and output variables.
    • Goal: The goal of supervised learning is to make accurate predictions or classifications for new, unseen data.
    • Examples: Common algorithms used in supervised learning include linear regression, decision trees, support vector machines (SVM), and artificial neural networks (ANN).
    • Applications: Supervised learning is widely used in tasks such as image classification, spam filtering, sentiment analysis, and speech recognition.
  2. Unsupervised Learning:
    • Definition: Unsupervised learning involves training a model on unlabeled data, without any specific target or output values.
    • Training Process: The model learns patterns, structures, and relationships in the data by finding hidden patterns or clustering similar data points.
    • Goal: The goal of unsupervised learning is to discover meaningful insights, group similar data points, or reduce the dimensionality of the data.
    • Examples: Common algorithms used in unsupervised learning include k-means clustering, hierarchical clustering, principal component analysis (PCA), and autoencoders.
    • Applications: Unsupervised learning is used in tasks such as customer segmentation, anomaly detection, recommendation systems, and data compression.
  3. Reinforcement Learning:
    • Definition: Reinforcement learning involves an agent learning to make decisions in an environment to maximize a reward signal over time.
    • Training Process: The agent learns through a trial-and-error process by taking actions in the environment and receiving feedback in the form of rewards or punishments.
    • Goal: The goal of reinforcement learning is to find an optimal policy or sequence of actions that maximizes the cumulative reward.
    • Examples: Reinforcement learning algorithms include Q-learning, deep Q-networks (DQN), and policy gradients.
    • Applications: Reinforcement learning is used in tasks such as game playing (e.g., AlphaGo), robotics control, autonomous driving, and resource management.

Comparison:

  • Supervised learning and unsupervised learning both involve learning patterns from data, but supervised learning requires labeled data, whereas unsupervised learning works with unlabeled data.
  • Reinforcement learning is different from supervised and unsupervised learning as it involves learning through interaction with an environment and optimizing for rewards.

Contrast:

  • Supervised learning focuses on predicting or classifying new data based on labeled examples, while unsupervised learning aims to find hidden patterns or structures in unlabeled data.
  • Reinforcement learning involves an agent interacting with an environment, receiving feedback in the form of rewards, and learning to maximize the cumulative reward.

It’s important to note that these three types of machine learning are not mutually exclusive, and they can be combined or used in conjunction with each other in various applications to leverage their respective strengths.

The Linear Model in Neural Networks

\[ f(x) = wx+b \]
  • w = weight
  • b = bias

Also true are:

\[ f(x) = x^Tw+b \]
\[ f(x) w^Tx+b \]

The Linear Model with Multiple Inputs

We can extend the model to multiple inputs where n, k > 1 m = 1.

Where:

  • n = the number of samples (observations)
  • m = the number of output variables, also the number of biases.
  • k = the number of input variables

The Linear Model with Multiple Outputs

We can extend the model to multiple inputs where n, k, m > 1 in the above equation.

The Objective Function

Two Types:

  • Loss Functions – used in supervised learning
  • Reward Functions – used in reinforcement learning

The objective function in machine learning refers to a mathematical expression that quantifies the goal or objective of the learning algorithm. It serves as a measure to optimize or minimize during the training process. The specific form of the objective function depends on the type of machine learning algorithm and the problem being addressed. Here are some common objective functions used in different types of machine learning:

  1. Supervised Learning:
    • Regression: In regression tasks, the objective function often involves minimizing the difference between the predicted values and the actual target values. The most common objective function is the Mean Squared Error (MSE), which calculates the average squared difference between the predicted and actual values.
    • Classification: In classification tasks, various objective functions can be used, such as the Cross-Entropy Loss or Log Loss. These functions measure the dissimilarity between the predicted class probabilities and the true class labels.
  2. Unsupervised Learning:
    • Clustering: Objective functions in clustering aim to quantify the compactness of clusters or the separation between different clusters. One commonly used objective function is the Within-Cluster Sum of Squares (WCSS), which measures the sum of squared distances between data points and their cluster centroids.
    • Dimensionality Reduction: Objective functions in dimensionality reduction techniques like Principal Component Analysis (PCA) involve maximizing the captured variance or minimizing the reconstruction error.
  3. Reinforcement Learning:
    • Reinforcement learning typically involves maximizing the cumulative reward over a sequence of actions taken by an agent in an environment. The objective function in reinforcement learning is often represented as the expected cumulative reward, which is optimized using techniques like Q-learning or policy gradients.

The choice of the objective function is crucial as it guides the learning algorithm towards finding the optimal or near-optimal solution for the given problem. The algorithm iteratively updates its parameters based on the objective function, using optimization techniques like gradient descent or stochastic optimization to minimize or maximize the objective function.

It’s important to note that the objective function may incorporate regularization terms to prevent overfitting or to introduce additional constraints into the learning process. Regularization terms penalize complex models or encourage specific properties like sparsity or smoothness. The specific form and components of the objective function may vary depending on the specific requirements and constraints of the problem at hand.

L2-norm Loss Function

  • Used in Regression.
  • It quantifies the discrepancy or difference between predicted values and actual values by calculating the squared difference between them.
\[ L2 \: norn = \sum_{i}(y_i – t_i)^2 \]

Where:

  • y = output values
  • t = target values

The lower the L2-norm, the better.

  • The optimization process in machine learning involves minimizing the L2-norm loss function by adjusting the model’s parameters. This adjustment is typically done through techniques like gradient descent, where the gradient of the loss function with respect to the model parameters is computed and used to iteratively update the parameters in the direction of steepest descent.
  • One advantage of the L2-norm loss function is that it has a closed-form solution and is convex, meaning it has a unique minimum. However, the L2-norm loss function can be sensitive to outliers, as the squared differences amplify their impact on the loss. In cases where outliers are prevalent, alternative loss functions like the L1-norm loss (absolute error) or Huber loss (a combination of L1 and L2) may be more robust.

Cross-Entropy Loss Function

  • Used in Classification.
\[ Cross \: Entropy = L(y,t) = -\sum_{i}t_iln(y_i) \]

Where:

  • y = output values
  • t = target values

The lower the Cross-Entropy, the better.

Optimization Algorithms

Optimization algorithms are used in machine learning to iteratively adjust the parameters or weights of a model in order to minimize or maximize an objective function. These algorithms search for the optimal set of parameters that lead to the best performance or highest reward.

1-Parameter Gradient Descent

  • In the context of optimization algorithms, “1-parameter gradient descent” refers to a simplified version of gradient descent that involves optimizing a function with respect to a single parameter.
  • Gradient descent is an iterative first-order optimization algorithm used to find a local minimum/maximum of a given function. This method is commonly used in machine learning and deep learning to minimize a cost/loss function (e.g. in a linear regression).
  • Gradient descent is an optimization algorithm which is commonly-used to train machine learning models and neural networks.  Training data helps these models learn over time, and the cost function within gradient descent specifically acts as a barometer, gauging its accuracy with each iteration of parameter updates. Until the function is close to or equal to zero, the model will continue to adjust its parameters to yield the smallest possible error.
  • The Greek letter η “eta” is the learning rate. Set it too low and you will have many iterations to reach the minimum/maximum. Set it too high, and the output of the iterations will oscillate and never reach the minimum/maximum.

n-Parameter Gradient Descent

Deep Learning Cheat Sheet