‎Machine Learning

Course objectives:

This comprehensive course offers participants a hands-on introduction to the world of machine learning. It covers essential topics such as data preprocessing, regression, classification, clustering, and dimensionality reduction. Participants will also explore advanced techniques such as neural networks and decision trees. The course is ideal for professionals aiming to transition into data science, as well as students and researchers interested in artificial intelligence.

Participant’s profile:

G12 students, data science enthusiasts, IT professionals, students in computer science, and researchers interested in AI and machine learning applications.

Requirements to the participants:

Basic computer skills.

Length of the course:

  • Credit Hours: 3
  • Total Hours Required: 20
  • Delivery format: online/offline
  • Contact hours: 3
  • Self-study: 10 hours
  • Final control: Project

Course content

1. Introduction to Machine Learning

Classroom session Topic for the classroom session Sub topics Classroom activities/forms of Self-study tasks hours
Overview of machine learning concepts Types of machine learning: supervised vs. unsupervised Setting up the machine learning environment (Anaconda, Jupyter Notebook)
  1. Basic concepts of machine learning and its importance.
  2. Types of machine learning: supervised vs. unsupervised.
  3. Setting up Anaconda and Jupyter Notebook for ML projects.
  • Interactive lecture on ML concepts and types with real-world examples.
  • Hands-on demo: Installing Anaconda and running a basic script in Jupyter Notebook.
  • Group discussion on supervised and unsupervised learning use cases.
  • Research applications of supervised and unsupervised learning in different industries.
  • Practice using Jupyter Notebook by writing and running a Python script.
  • Watch an introductory video/tutorial on machine learning basics.
5

2. Data Preprocessing

Classroom session Topic for the classroom session Sub topics Classroom activities/forms of Self-study tasks hours
Data cleaning and preparation
  • Data cleaning and preparation
  • Feature scaling and normalization
  • Data splitting: training and test sets
  1. Techniques for data cleaning and handling missing values.
  2. Importance and methods of feature scaling and normalization.
  3. Splitting data into training and test sets.
  • Interactive demo: Cleaning a sample dataset by handling missing values and outliers.
  • Hands-on coding: Apply feature scaling techniques using Python libraries.
  • Group activity: Split a dataset into training and test sets and discuss the importance of each.
  • Practice cleaning a dataset using Python (pandas) and handling missing values.
  • Implement feature scaling and normalization on a new dataset.
  • Research different data-splitting strategies and their impact on model performance.
5

3. Regression and Classification

Classroom session Topic for the classroom session Sub topics Classroom activities/forms of Self-study tasks hours
Linear regression model
  • Logistic regression for classification
  • Evaluation metrics for classification and regression models
  1. Understanding and implementing a linear regression model.
  2. Basics of logistic regression for classification tasks.
  3. Evaluation metrics: accuracy, precision, recall, and RMSE.
  • Interactive coding: Build a simple linear regression model and visualize results.
  • Hands-on exercise: Train a logistic regression model on a sample dataset.
  • Discussion: Analyze evaluation metrics and interpret their significance.
  • Write a Python script to implement linear regression using scikit-learn.
  • Train and test a logistic regression model on a new classification dataset.
  • Research and summarize when to use specific evaluation metrics for regression and classification.
5

4. Clustering and Dimensionality Reduction

Classroom session Topic for the classroom session Sub topics Classroom activities/forms of Self-study tasks hours
k-means clustering
  • Hierarchical clustering
  • PCA for dimensionality reduction
  1. Fundamentals and implementation of k-means clustering.
  2. Overview of hierarchical clustering and dendrograms.
  3. Principal Component Analysis (PCA) for reducing dimensionality.
  1. Hands-on coding: Apply k-means clustering to group a dataset and visualize clusters.
  2. Interactive demo: Perform hierarchical clustering and interpret a dendrogram.
  3. Group discussion: Explore how PCA improves model performance by reducing dimensions.
  1. Practice k-means clustering on a different dataset and evaluate results.
  2. Use Python to apply hierarchical clustering and create a dendrogram.
  3. Research and implement PCA on a high-dimensional dataset, interpreting the variance explained.
5

5. Advanced Topics and Applications

Classroom session Topic for the classroom session Sub topics Classroom activities/forms of Self-study tasks hours
Introduction to neural networks Overview of decision trees and random forests
  1. Basics of neural networks: structure and working principles.
  2. Introduction to decision trees: building and interpreting decision rules.
  3. Overview of random forests: ensemble learning and improving accuracy.
  1. Interactive demo: Visualize the structure of a simple neural network and explain its components.
  2. Hands-on coding: Build and interpret a decision tree using a small dataset.
  3. Group activity: Compare results of decision trees and random forests on a sample dataset.
  1. Research and summarize the differences between neural networks, decision trees, and random forests.
  2. Practice building a random forest model using scikit-learn and evaluate its accuracy.
  3. Explore an online resource or tutorial on neural networks for further understanding.
5

Final control

Assessment Components:

  1. Final Exam (40%):

    • Format: Written and practical components.
    • Content: Covers all chapters, including machine learning concepts, data preprocessing, regression, clustering, and advanced topics.
    • Skills Assessed: Application of concepts, implementation of algorithms, and interpretation of results.
  2. Project (30%):

    • Description: Develop a machine learning model to solve a real-world problem.
    • Example: Predict housing prices using regression, or classify customer reviews using logistic regression.
    • Evaluation Criteria: Problem understanding and approach, code quality and functionality, analysis and presentation of results.
  3. Class Participation and Assignments (20%):

    • Description: Ongoing assessment of participation in classroom activities and submission of weekly tasks.
    • Evaluation Criteria: Consistency and engagement in class, accuracy and completeness of assignments.
  4. Quizzes (10%):

    • Description: Two quizzes during the course to assess understanding of key topics.
    • Format: Multiple-choice, coding snippets, and short-answer questions.
    • Content: Focuses on data preprocessing, regression, and clustering.

Learning Outcomes:

By the end of the course, students will be able to:

  1. Machine Learning Fundamentals:

    • Explain key concepts of machine learning and distinguish between supervised, unsupervised, and advanced learning methods.
  2. Data Preprocessing:

    • Perform data cleaning, feature scaling, normalization, and split datasets for training and testing.
  3. Regression and Classification:

    • Build and evaluate linear regression and logistic regression models.
    • Apply and interpret evaluation metrics like accuracy, precision, recall, and RMSE.
  4. Clustering and Dimensionality Reduction:

    • Implement clustering algorithms like k-means and hierarchical clustering.
    • Apply Principal Component Analysis (PCA) to reduce dimensionality and enhance model performance.
  5. Advanced Techniques:

    • Understand the basics of neural networks, decision trees, and random forests.
    • Implement and analyze these models in Python for solving real-world problems.
  6. Practical Implementation:

    • Use Python libraries such as pandas, scikit-learn, and matplotlib to preprocess data, build models, and visualize results.
  7. Critical Thinking:

    • Analyze the strengths and limitations of different machine learning techniques.
    • Evaluate and optimize models based on problem requirements and dataset characteristics.
  8. Ethical Considerations:

    • Understand the ethical implications of data handling and machine learning applications.