My Notes – Machine Learning

Algorithms

  • EDA (Latent Dirichlet Allocation) – unsupervised, used to discover user-specified number of topics in a text corpus.
  • NTM (Neural Topic Model) – unsupervised, used to organize text corpus into topics based on their statistical distribution.
  • Object2Vec – an embedding algorithm, learns low dimensional dense embedings from high dimensional objects.
  • XGBoost – open-source, supervised, used for regression, classification, and ranking problems.

Measures

Low variance vs high variance (high variance is good for model).

  • \text{F1 Score} = \frac{2 \cdot \text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}}
  • \text{Recall} = \frac{TP}{TP + FN}
  • \text{Precision} = \frac{TP}{TP + FP}
  • \text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}

Dimensionality Reduction

  • PCA – Principal Component Analysis, linear technique of dimensionality reduction. It maximizes variance of the data in lower dimensional representation.
  • NMF – Non-Negative Matrix Factorization, dimensionality reduction, source separation and topic extraction.
  • LDA – Linear Discriminant Analysis – finds linear combination of features that can differentiate two or more classes of objects. GDA – Generalized Discriminant Analysis

Hyperparameter Tuning

  • Grid Searchexhaustive searching through a manually specified subset of the hyperparameter space.
  • Random Search – replaces the exhaustive enumeration of all combinations by selecting them randomly.
  • Bayesian Optimization – uses regression to choose next values.
  • Hyperband – only used to tune iterative algorithms, once they publish accuracy metrics after every epoch.

AWS Services

  • Amazon Kinesis Data Firehose – real-time streaming who data; collects, processes and loads data to data lakes, warehouses and analytics services.
  • Amazon Kinesis Data Streams – manual scaling, can store data, open-ended support.
  • Amazon Personalize – fully managed, for personalized data and recommendations; continuous learning to improve performance.
  • Amazon Forecast – fully managed, uses historical data for forecasting, resource planning, financial planning.
  • Amazon Rekognition – search, verify and organize images and videos.