Introduction to Machine Learning

Transforming Data into Business Intelligence

From Concepts to Real-World Applications

ISM6251: Week 1 - Part 2
Dr. Tim Smith | Fall 2025

What is Machine Learning?

Evolution of Intelligent Systems

Arthur Samuel (1959):
"Field of study that gives computers the ability to learn without being explicitly programmed."

Traditional Approach

  • Explicit programming
  • Fixed rules
  • Manual updates
  • Limited adaptability

Machine Learning Approach

  • Learn from data
  • Discover patterns
  • Automatic improvement
  • Adaptive systems
Key Insight: ML enables computers to improve their performance on tasks through experience, without being explicitly programmed for every scenario.

Formal Definition of Machine Learning

Tom Mitchell's Framework (1997)

A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.

Task (T)

  • Classification
  • Regression
  • Clustering
  • Recommendation

Experience (E)

  • Training data
  • User feedback
  • Historical records
  • Sensor readings

Performance (P)

  • Accuracy
  • Error rate
  • Precision/Recall
  • Business metrics
Example: Email spam filter learns (E) from labeled emails to classify (T) new emails, improving its accuracy (P) over time.

Machine Learning vs Traditional Programming

Paradigm Shift in Problem Solving

Traditional Programming

# Rule-based approach
def classify_temperature(temp):
    if temp > 90:
        return "Hot"
    elif temp > 70:
        return "Warm"
    elif temp > 50:
        return "Cool"
    else:
        return "Cold"
  • Programmer defines rules
  • Logic is explicit
  • Fixed thresholds
  • Doesn't adapt to context

Machine Learning

# Learning-based approach
from sklearn.tree import DecisionTreeClassifier

# Learn from data
model = DecisionTreeClassifier()
model.fit(X_train, y_train)

# Predict new data
prediction = model.predict(X_new)
  • Algorithm discovers rules
  • Logic is learned
  • Dynamic thresholds
  • Adapts to patterns in data

Types of Machine Learning

Three Main Paradigms

🎯 Supervised Learning

Learning from labeled examples

  • Known inputs & outputs
  • Learn mapping function
  • Predict new outputs
Example: Predicting house prices from features

🔍 Unsupervised Learning

Finding patterns without labels

  • No target variable
  • Discover structure
  • Group similar items
Example: Customer segmentation

🎮 Reinforcement Learning

Learning through interaction

  • Agent and environment
  • Actions and rewards
  • Optimize strategy
Example: Game playing AI

Supervised Learning

Learning from Labeled Data

Classification

Predict discrete categories

  • Binary: Yes/No, True/False
  • Multi-class: Multiple categories
  • Multi-label: Multiple tags
Business Uses:
• Fraud detection
• Customer churn prediction
• Email categorization

Regression

Predict continuous values

  • Linear: Straight-line relationships
  • Non-linear: Complex patterns
  • Time series: Temporal data
Business Uses:
• Sales forecasting
• Price optimization
• Demand prediction
Algorithm Type Use Case Pros Cons
Linear Regression Regression Price prediction Simple, interpretable Assumes linearity
Logistic Regression Classification Binary outcomes Probabilistic output Linear boundaries
Decision Trees Both Rule-based decisions Interpretable Overfitting prone
Random Forest Both Complex patterns Robust, accurate Black box

Unsupervised Learning

Discovering Hidden Patterns

Clustering

  • K-Means: Partition into K groups
  • Hierarchical: Tree of clusters
  • DBSCAN: Density-based
Applications:
• Customer segmentation
• Document organization
• Image compression

Dimensionality Reduction

  • PCA: Principal components
  • t-SNE: Visualization
  • Autoencoders: Neural approach
Applications:
• Feature extraction
• Data visualization
• Noise reduction
Challenge: Without labels, evaluating performance is subjective. Success often measured by business outcomes rather than accuracy metrics.

Reinforcement Learning

Learning Through Interaction

Key Components

  • Agent: The learner/decision maker
  • Environment: What the agent interacts with
  • State: Current situation
  • Action: What the agent can do
  • Reward: Feedback signal
  • Policy: Strategy for choosing actions

How It Works

  1. Agent observes current state
  2. Selects action based on policy
  3. Environment responds with new state
  4. Agent receives reward/penalty
  5. Updates policy to maximize future rewards

Common Algorithms

  • Q-Learning: Learn action values
  • SARSA: On-policy learning
  • Deep Q-Networks: Neural network approach
  • Policy Gradient: Direct policy optimization
  • Actor-Critic: Combines value and policy

Business Applications

  • Dynamic Pricing: Optimize prices in real-time
  • Recommendation Systems: Sequential recommendations
  • Resource Allocation: Optimize resource usage
  • Trading Strategies: Automated trading decisions
  • Supply Chain: Inventory management
Key Difference: Unlike supervised learning with fixed datasets, RL learns from experience through trial and error, balancing exploration of new strategies with exploitation of known good strategies.

The Machine Learning Workflow

End-to-End Process

1. Problem Definition
2. Data Collection
3. Data Preparation
4. Model Selection
5. Model Training
6. Model Evaluation
7. Deployment
8. Monitoring & Maintenance

Step 1: Problem Definition

Framing the Business Challenge

Key Questions to Answer

  • What is the business problem?
  • Is ML the right solution?
  • What does success look like?
  • What are the constraints?
  • Who are the stakeholders?

Define Clear Objectives

  • Business Goal: Increase revenue, reduce costs
  • ML Task: Classification, regression, clustering
  • Success Metrics: Accuracy, ROI, user satisfaction
  • Constraints: Time, budget, resources
  • Deliverables: Model, API, dashboard
Common Mistake: Starting with a solution in mind rather than understanding the problem. "We need deep learning!" vs "We need to reduce customer churn by 20%"

Step 2: Data Collection

Gathering Your Raw Materials

Data Sources

  • Internal: Databases, logs, CRM systems
  • External: APIs, public datasets, purchases
  • Generated: Surveys, experiments, sensors
  • Third-party: Data vendors, partnerships

Quality Considerations

  • Relevance: Does it help solve the problem?
  • Accuracy: How reliable is the source?
  • Completeness: Are there gaps?
  • Timeliness: Is it current enough?
  • Volume: Do we have enough data?
Legal & Ethical: Always verify data usage rights, privacy compliance (GDPR, CCPA), and consider ethical implications of data collection.

Step 3: Data Preparation

Transforming Raw Data into ML-Ready Format

Data Cleaning

  • Handle missing values (drop, impute, flag)
  • Remove duplicates
  • Fix inconsistencies
  • Detect and handle outliers
  • Correct data types

Feature Engineering

  • Create new features from existing ones
  • Encode categorical variables
  • Scale/normalize numerical features
  • Extract temporal features
  • Combine features (interactions, polynomials)
# Example: Data preparation
df['age_group'] = pd.cut(df['age'], bins=[0, 25, 40, 60, 100])
df['log_income'] = np.log1p(df['income'])
df = pd.get_dummies(df, columns=['category'])
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
Time Investment: Expect to spend 60-80% of your project time on data preparation. Quality data preparation is the foundation of successful ML models.

Step 4: Model Selection

Choosing the Right Algorithm

Selection Criteria

  • Problem Type: Classification vs regression
  • Data Size: Small vs big data
  • Interpretability: Black box vs explainable
  • Training Time: Real-time vs batch
  • Accuracy Requirements: Good enough vs state-of-art

Common Starting Points

  • Linear Models: Simple, fast, interpretable
  • Tree-Based: Handle non-linearity well
  • Ensemble Methods: Often best accuracy
  • Neural Networks: Complex patterns, lots of data
  • SVM: High-dimensional data
Best Practice: Start simple! Begin with logistic/linear regression as a baseline, then try more complex models only if needed. Simple models are easier to deploy and maintain.

Step 5: Model Training

Teaching the Algorithm to Learn

Training Process

  • Initialize model with default parameters
  • Feed training data to algorithm
  • Algorithm learns patterns
  • Adjust hyperparameters
  • Use validation set for tuning

Hyperparameter Tuning

  • Grid Search: Try all combinations
  • Random Search: Sample randomly
  • Bayesian Optimization: Smart search
  • Cross-Validation: Robust evaluation
  • Early Stopping: Prevent overfitting
# Example: Model training with hyperparameter tuning
from sklearn.model_selection import GridSearchCV

param_grid = {'max_depth': [3, 5, 7], 'min_samples_split': [2, 5, 10]}
grid_search = GridSearchCV(DecisionTreeClassifier(), param_grid, cv=5)
grid_search.fit(X_train, y_train)
best_model = grid_search.best_estimator_

Step 6: Model Evaluation

Measuring Performance and Validity

Evaluation Strategy

  • Use test set (never seen during training)
  • Calculate relevant metrics
  • Compare to baseline/benchmark
  • Check for overfitting/underfitting
  • Validate business assumptions

Key Metrics by Type

Regression:
  • RMSE, MAE, MAPE
  • R², Adjusted R²
Classification:
  • Accuracy, Precision, Recall
  • F1-Score, ROC-AUC
  • Confusion Matrix
Business Context: A 95% accurate model isn't useful if the 5% errors are your most valuable customers. Always evaluate in business context, not just statistical metrics.

Step 7: Deployment

Putting Models into Production

Deployment Options

  • Batch: Periodic predictions
  • Real-time: API endpoints
  • Edge: On-device deployment
  • Embedded: Within applications
  • Cloud: Scalable services

Production Considerations

  • Scalability: Handle production load
  • Latency: Response time requirements
  • Integration: Fit into existing systems
  • Versioning: Track model versions
  • Rollback: Plan for failures
MLOps: Modern deployment uses MLOps practices - automated pipelines, containerization (Docker), orchestration (Kubernetes), and model serving platforms (TensorFlow Serving, MLflow).

Step 8: Monitoring & Maintenance

Keeping Models Healthy in Production

What to Monitor

  • Performance Metrics: Accuracy over time
  • Data Drift: Input distribution changes
  • Concept Drift: Relationship changes
  • System Health: Latency, errors, uptime
  • Business Impact: ROI, user satisfaction

Maintenance Actions

  • Retraining: Update with new data
  • Recalibration: Adjust thresholds
  • A/B Testing: Compare versions
  • Feature Updates: Add/remove features
  • Model Replacement: Switch algorithms
Model Decay: All models degrade over time as patterns change. Without monitoring and maintenance, a great model can become harmful to business.

Machine Learning in Business

Creating Competitive Advantage

🚀 Automation

  • Reduce manual tasks
  • Speed up decisions
  • Scale operations
  • 24/7 availability

💡 Intelligence

  • Discover patterns
  • Predict outcomes
  • Optimize processes
  • Personalize experiences

💰 Value Creation

  • Increase revenue
  • Reduce costs
  • Mitigate risks
  • Improve satisfaction
McKinsey Report: Companies using AI/ML report 20% revenue increase and 30% cost reduction in affected business areas.

Industry Applications

ML Transforming Every Sector

💳 Finance

Fraud detection, credit scoring, algorithmic trading

🛍️ Retail

Recommendations, demand forecasting, pricing

🏥 Healthcare

Diagnosis, drug discovery, patient monitoring

🏭 Manufacturing

Quality control, predictive maintenance, optimization

📱 Technology

Search, NLP, computer vision, virtual assistants

🚗 Transportation

Route optimization, autonomous vehicles, logistics

Key Trend: ML is becoming a horizontal technology - essential across all industries, not just tech companies.

Real-World Success Stories

ML Creating Business Impact

Netflix: Recommendation Engine

  • Problem: Content discovery
  • Solution: Collaborative filtering + deep learning
  • Impact: 80% of views from recommendations1
  • Value: $1B+ annual savings in retention2

Amazon: Demand Forecasting

  • Problem: Inventory optimization
  • Solution: Time series + ML with Amazon Forecast
  • Impact: 20-50% reduction in inventory costs3
  • Value: 30-50% reduction in forecast errors4

JPMorgan: COiN System

  • Problem: Manual contract review
  • Solution: NLP + machine learning (COiN)
  • Impact: 360,000 hours → seconds5
  • Value: 80% reduction in compliance errors6

UPS: ORION Route Optimization

  • Problem: Delivery efficiency
  • Solution: ML-powered ORION system
  • Impact: 100M fewer miles/year7
  • Value: $300-400M annual savings8
References:
1 Netflix's 80% recommendation-driven viewing (RebuyEngine, 2024)
2 Netflix $1B+ savings from ML (Harvard Business School, 2017)
3,4 Amazon Forecast: 20-50% cost reduction (AWS ML Blog, 2024)
5 JPMorgan COiN: 360,000 hours to seconds (Bloomberg, 2017)
6 COiN 80% error reduction (Head of AI, 2024)
7 UPS ORION: 100M miles saved (INFORMS, 2016)
8 UPS $400M annual savings (Best Practice AI, 2024)

When to Use Machine Learning

Making the Right Choice

✅ Good Candidates for ML

  • Large amounts of data available
  • Patterns exist but are complex
  • Problem scales beyond human capacity
  • Need for personalization
  • Environment changes over time
  • Prediction/classification tasks
Example: Email spam filtering - patterns evolve, large volume, personalization needed

❌ Poor Candidates for ML

  • Simple rules suffice
  • Limited or no data
  • Need 100% accuracy
  • Complete transparency required
  • Ethical/legal constraints
  • One-time decisions
Example: Calculating taxes - clear rules, legal requirements, need for explainability

ML Readiness Checklist

Are You Ready for Machine Learning?

Data Requirements

  • ☐ Sufficient data volume (1000s+ examples)
  • ☐ Quality data (accurate, consistent)
  • ☐ Representative data (covers all cases)
  • ☐ Legal right to use data
  • ☐ Privacy compliance (GDPR, etc.)

Business Requirements

  • ☐ Clear business objective
  • ☐ Defined success metrics
  • ☐ Acceptable error tolerance
  • ☐ ROI justification
  • ☐ Stakeholder buy-in

Technical Requirements

  • ☐ Technical expertise available
  • ☐ Computing resources
  • ☐ Integration capability
  • ☐ Monitoring infrastructure
  • ☐ Update mechanism

Organizational Requirements

  • ☐ Data-driven culture
  • ☐ Risk tolerance
  • ☐ Change management plan
  • ☐ Ethical guidelines
  • ☐ Long-term commitment

Common ML Challenges

Technical and Business Obstacles

Technical Challenges

  • Data Quality: Missing, noisy, biased data
  • Overfitting: Model memorizes training data
  • Underfitting: Model too simple
  • Feature Engineering: Choosing right inputs
  • Scalability: Handling large datasets
  • Model Drift: Performance degrades over time

Business Challenges

  • Interpretability: Explaining decisions
  • Integration: Fitting into workflows
  • Trust: User acceptance
  • ROI: Measuring value
  • Talent: Finding skilled people
  • Expectations: Managing hype
Reality Check: 87% of ML projects never make it to production. Success requires addressing both technical and organizational challenges.

Ethical Considerations

Responsible AI and ML

Key Ethical Issues

  • Bias: Unfair treatment of groups
  • Privacy: Personal data protection
  • Transparency: Explainable decisions
  • Accountability: Who's responsible?
  • Security: Adversarial attacks

Best Practices

  • Diverse training data
  • Regular bias audits
  • Privacy by design
  • Human oversight
  • Clear documentation
  • Ethical review boards
Remember: With great power comes great responsibility. ML systems can perpetuate or amplify existing biases if not carefully designed.

Your Machine Learning Journey

Building ML Expertise

Foundation (Weeks 1-4)

  • Python programming basics
  • Statistics & probability
  • Data manipulation (pandas)
  • Visualization (matplotlib)
  • Linear algebra basics

Core ML (Weeks 5-8)

  • Supervised learning algorithms
  • Model evaluation
  • Feature engineering
  • Cross-validation
  • Hyperparameter tuning

Advanced (Weeks 9-11)

  • Ensemble methods
  • Neural networks
  • Unsupervised learning
  • Deep learning intro
  • Special topics

Application (Week 12+)

  • Real-world projects
  • End-to-end pipelines
  • Model deployment
  • Performance monitoring
  • Business integration

Essential Tools & Resources

Your ML Toolkit

Languages & Libraries

  • Python: Primary language
  • NumPy: Numerical computing
  • Pandas: Data manipulation
  • Scikit-learn: ML algorithms
  • Matplotlib: Visualization

Development Tools

  • Jupyter: Interactive notebooks
  • Git: Version control
  • Docker: Containerization
  • VS Code: IDE
  • Google Colab: Cloud notebooks

Learning Resources

  • Kaggle: Competitions & datasets
  • Coursera: Online courses
  • Papers: arXiv.org
  • Communities: Reddit, Stack Overflow
  • Books: ISLR, Pattern Recognition
Pro Tip: Start with scikit-learn for classical ML. Move to TensorFlow/PyTorch only when you need deep learning capabilities.

Key Takeaways

What to Remember

Core Concepts

  • ML learns patterns from data
  • Three types: supervised, unsupervised, reinforcement
  • Workflow: problem → data → model → deploy
  • Success requires quality data
  • Business value drives ML adoption

Practical Insights

  • Start with simple models
  • Focus on data quality
  • Measure business impact
  • Consider ethical implications
  • Plan for maintenance
Remember: Machine Learning is a tool, not magic. Success comes from understanding both the technology and the business problem.

Your Next Steps

Preparing for Next Week

  1. DataCamp: Create account with USF email & complete "Introduction to Python" course
  2. Honorlock: Install and validate with "honorlock-test" quiz
  3. Quiz Prep: Review Week 1 lecture content for in-class quiz next week
  4. Equipment: Ensure laptop is fully charged for next class
  5. Connect: Join the course discussion forum
Next Class: We'll dive into hands-on ML with Python, pandas, and scikit-learn. Quiz 1 will be administered during class time.
Support: Contact class TA for DataCamp issues. Honorlock support: 24/7 at USF Support Site