# Week 2B: NumPy Fundamentals - The Foundation of Scientific Python

## Welcome to Numerical Computing with Python!

NumPy (Numerical Python) is the cornerstone of scientific computing in Python. It provides the foundation that powers nearly every data science and machine learning library, including pandas, scikit-learn, TensorFlow, and PyTorch. In this notebook, we'll explore why NumPy is so powerful and how to harness its capabilities for efficient numerical computation.

### Why NumPy Matters

Imagine you're working with a dataset of 1 million data points. Using Python lists, a simple operation like calculating the square of each number would require writing a loop and would be painfully slow. NumPy performs this operation **50-100 times faster** with cleaner, more readable code.

As the NumPy documentation states: *"NumPy is the fundamental package for scientific computing in Python. It's the foundation on which nearly all higher-level tools are built."*

### Core Advantages of NumPy

1. **Performance**: Operations are 10-100x faster than pure Python lists
2. **Vectorization**: Write mathematical expressions without explicit loops
3. **Broadcasting**: Intelligently handle operations between arrays of different shapes
4. **Memory Efficiency**: Contiguous memory storage reduces overhead
5. **Ecosystem Foundation**: Powers pandas, scikit-learn, and most ML libraries

### What We'll Learn

In this comprehensive notebook, we'll master:

1. **Array Creation**: Multiple ways to create and initialize arrays
2. **Indexing and Slicing**: Powerful techniques to access and modify data
3. **Array Operations**: Vectorized computations and broadcasting
4. **Statistical Functions**: Built-in functions for data analysis
5. **Performance Optimization**: Why NumPy is so fast

Let's begin our journey into the world of efficient numerical computing!

## Installing and Importing NumPy

Before we begin, let's make sure NumPy is installed and import it. NumPy is conventionally imported as `np`.

In [None]:
# Import NumPy
import numpy as np

# Check version
print(f"NumPy version: {np.__version__}")

# Set random seed for reproducibility
np.random.seed(42)
print("Random seed set for reproducible results")

## Part 1: Creating NumPy Arrays

### Understanding NumPy Arrays

The core of NumPy is the **ndarray** (n-dimensional array) object. Unlike Python lists, NumPy arrays:
- Have a fixed size at creation
- Contain elements of the same data type
- Enable fast mathematical operations
- Support multidimensional data naturally

Let's explore the various ways to create arrays:

### Creating Arrays from Python Lists

In [None]:
# Creating a 1D array from a list
python_list = [1, 2, 3, 4, 5]
arr_1d = np.array(python_list)

print("Python list:", python_list)
print("NumPy array:", arr_1d)
print(f"Type of Python list: {type(python_list)}")
print(f"Type of NumPy array: {type(arr_1d)}")
print(f"Array data type: {arr_1d.dtype}")

In [None]:
# Creating a 2D array (matrix) from nested lists
matrix_list = [[1, 2, 3],
               [4, 5, 6],
               [7, 8, 9]]

matrix = np.array(matrix_list)

print("2D Array (Matrix):")
print(matrix)
print(f"\nShape: {matrix.shape}")  # (rows, columns)
print(f"Dimensions: {matrix.ndim}")
print(f"Size (total elements): {matrix.size}")
print(f"Data type: {matrix.dtype}")

In [None]:
# Creating arrays with specific data types
# This is important for memory efficiency and precision

# Integer array
int_array = np.array([1, 2, 3, 4], dtype=np.int32)
print(f"Integer array: {int_array}, dtype: {int_array.dtype}")

# Float array
float_array = np.array([1, 2, 3, 4], dtype=np.float64)
print(f"Float array: {float_array}, dtype: {float_array.dtype}")

# Complex number array
complex_array = np.array([1+2j, 3+4j, 5+6j])
print(f"Complex array: {complex_array}, dtype: {complex_array.dtype}")

# Boolean array
bool_array = np.array([True, False, True, False])
print(f"Boolean array: {bool_array}, dtype: {bool_array.dtype}")

### Initialization Functions

NumPy provides many convenient functions to create arrays with specific patterns:

In [None]:
# Arrays filled with zeros
zeros_1d = np.zeros(5)
zeros_2d = np.zeros((3, 4))  # 3 rows, 4 columns

print("1D zeros array:")
print(zeros_1d)
print("\n2D zeros array:")
print(zeros_2d)

# Arrays filled with ones
ones_1d = np.ones(5)
ones_2d = np.ones((2, 3))

print("\n1D ones array:")
print(ones_1d)
print("\n2D ones array:")
print(ones_2d)

In [None]:
# Arrays filled with a specific value
full_array = np.full((3, 3), 7)
print("Array filled with 7:")
print(full_array)

# Identity matrix (diagonal of 1s)
identity = np.eye(4)
print("\n4x4 Identity matrix:")
print(identity)

# Diagonal matrix
diagonal = np.diag([1, 2, 3, 4])
print("\nDiagonal matrix:")
print(diagonal)

### Creating Sequences and Ranges

In [None]:
# arange: similar to Python's range() but returns an array
# arange(start, stop, step)

range1 = np.arange(10)  # 0 to 9
print(f"Simple range: {range1}")

range2 = np.arange(1, 11)  # 1 to 10
print(f"Range from 1 to 10: {range2}")

range3 = np.arange(0, 20, 2)  # Even numbers from 0 to 18
print(f"Even numbers: {range3}")

range4 = np.arange(1, 2, 0.1)  # Float step
print(f"Float step: {range4}")

In [None]:
# linspace: create evenly spaced numbers over a specified interval
# linspace(start, stop, num_points)

linear1 = np.linspace(0, 1, 5)  # 5 points from 0 to 1
print(f"5 points from 0 to 1: {linear1}")

linear2 = np.linspace(0, 10, 11)  # 11 points from 0 to 10
print(f"11 points from 0 to 10: {linear2}")

# Useful for plotting
x = np.linspace(-np.pi, np.pi, 9)  # Points from -Ï€ to Ï€
print(f"\nPoints from -Ï€ to Ï€:")
print(x)
print(f"Sine values:")
print(np.sin(x).round(3))

### Random Array Generation

Random arrays are essential for simulations, testing, and machine learning:

In [None]:
# Set seed for reproducibility
np.random.seed(42)

# Uniform distribution [0, 1)
uniform = np.random.rand(3, 3)
print("Uniform distribution [0, 1):")
print(uniform.round(3))

# Normal distribution (mean=0, std=1)
normal = np.random.randn(3, 3)
print("\nStandard normal distribution:")
print(normal.round(3))

# Random integers
integers = np.random.randint(0, 10, size=(3, 4))  # 0 to 9
print("\nRandom integers [0, 10):")
print(integers)

In [None]:
# More random distributions
np.random.seed(42)

# Custom normal distribution
custom_normal = np.random.normal(loc=100, scale=15, size=1000)
print(f"Custom normal (meanâ‰ˆ100, stdâ‰ˆ15):")
print(f"  Actual mean: {custom_normal.mean():.2f}")
print(f"  Actual std: {custom_normal.std():.2f}")

# Binomial distribution (coin flips)
coin_flips = np.random.binomial(n=1, p=0.5, size=20)  # 20 coin flips
print(f"\n20 coin flips (0=tails, 1=heads): {coin_flips}")
print(f"Number of heads: {coin_flips.sum()}")

# Choice from array
choices = np.random.choice(['A', 'B', 'C', 'D'], size=10, p=[0.1, 0.3, 0.4, 0.2])
print(f"\nWeighted random choices: {choices}")

### ðŸŽ¯ Practice Exercise 1: Array Creation

Create various arrays based on specifications:

In [None]:
# Exercise: Create the following arrays
# 1. A 5x5 matrix with all elements as 3
# 2. A 4x4 matrix with 1s on the border and 0s inside
# 3. An array of 20 evenly spaced points between 0 and 2Ï€
# 4. A 3x3x3 array with random values

# Your code here:
# TODO: Create matrix_3s (5x5 filled with 3)
# TODO: Create border_matrix (4x4 with 1s on border)
# TODO: Create angle_points (20 points from 0 to 2Ï€)
# TODO: Create cube_array (3x3x3 random)


# Test your solution (uncomment when ready):
# print("5x5 matrix of 3s:")
# print(matrix_3s)
# print("\n4x4 border matrix:")
# print(border_matrix)
# print("\nAngle points shape:", angle_points.shape)
# print("\n3D array shape:", cube_array.shape)

## Part 2: Array Indexing and Slicing

### Understanding Array Indexing

NumPy provides powerful and flexible ways to access and modify array elements. Understanding indexing is crucial for effective data manipulation.

### 1D Array Indexing

In [None]:
# Create a sample 1D array
arr = np.array([10, 20, 30, 40, 50, 60, 70, 80, 90])
print(f"Original array: {arr}")

# Basic indexing (0-based)
print(f"\nFirst element (index 0): {arr[0]}")
print(f"Third element (index 2): {arr[2]}")
print(f"Last element (index -1): {arr[-1]}")
print(f"Second to last (index -2): {arr[-2]}")

# Slicing [start:stop:step]
print(f"\nElements 1 to 4: {arr[1:5]}")
print(f"First 3 elements: {arr[:3]}")
print(f"Last 3 elements: {arr[-3:]}")
print(f"Every 2nd element: {arr[::2]}")
print(f"Reverse array: {arr[::-1]}")

### 2D Array Indexing

In [None]:
# Create a 2D array (matrix)
matrix = np.array([[1, 2, 3, 4],
                   [5, 6, 7, 8],
                   [9, 10, 11, 12],
                   [13, 14, 15, 16]])

print("Original matrix:")
print(matrix)

# Single element access [row, column]
print(f"\nElement at row 0, col 0: {matrix[0, 0]}")
print(f"Element at row 1, col 2: {matrix[1, 2]}")
print(f"Element at row -1, col -1: {matrix[-1, -1]}")

# Row access
print(f"\nFirst row: {matrix[0]}")
print(f"Second row: {matrix[1, :]}")

# Column access
print(f"\nFirst column: {matrix[:, 0]}")
print(f"Third column: {matrix[:, 2]}")

In [None]:
# Advanced 2D slicing
print("Original matrix:")
print(matrix)

# Submatrix extraction
print("\nTop-left 2x2 submatrix:")
print(matrix[:2, :2])

print("\nBottom-right 2x2 submatrix:")
print(matrix[-2:, -2:])

print("\nMiddle 2x2 submatrix:")
print(matrix[1:3, 1:3])

print("\nEvery other row and column:")
print(matrix[::2, ::2])

print("\nReverse rows:")
print(matrix[::-1, :])

print("\nReverse columns:")
print(matrix[:, ::-1])

### Boolean Indexing (Masking)

One of NumPy's most powerful features is the ability to use boolean arrays for indexing:

In [None]:
# Create sample data
data = np.array([23, 45, 67, 89, 12, 34, 56, 78, 90, 11])
print(f"Original data: {data}")

# Create boolean mask
mask = data > 50
print(f"\nBoolean mask (data > 50): {mask}")

# Use mask to filter
filtered = data[mask]
print(f"Filtered data (> 50): {filtered}")

# Direct boolean indexing
print(f"\nData between 30 and 70: {data[(data >= 30) & (data <= 70)]}")
print(f"Data < 20 or > 80: {data[(data < 20) | (data > 80)]}")

# Modify values using boolean indexing
data_copy = data.copy()
data_copy[data_copy < 50] = 0
print(f"\nSet values < 50 to 0: {data_copy}")

In [None]:
# Boolean indexing with 2D arrays
matrix = np.random.randint(0, 100, size=(5, 5))
print("Random matrix:")
print(matrix)

# Find all elements > 50
print(f"\nElements > 50: {matrix[matrix > 50]}")

# Count elements > 50
print(f"Number of elements > 50: {(matrix > 50).sum()}")

# Replace values conditionally
matrix_modified = matrix.copy()
matrix_modified[matrix_modified > 50] = 100
matrix_modified[matrix_modified <= 50] = 0
print("\nBinary matrix (0 if â‰¤50, 100 if >50):")
print(matrix_modified)

### Fancy Indexing

Use arrays of indices to access multiple elements simultaneously:

In [None]:
# 1D fancy indexing
arr = np.array([10, 20, 30, 40, 50, 60, 70, 80, 90])
indices = [0, 2, 5, 8]

print(f"Original array: {arr}")
print(f"Indices to select: {indices}")
print(f"Selected elements: {arr[indices]}")

# Reorder elements
reorder = [8, 0, 4, 2, 6, 1, 7, 3, 5]
print(f"\nReordered array: {arr[reorder]}")

# 2D fancy indexing
matrix = np.arange(1, 17).reshape(4, 4)
print("\nOriginal matrix:")
print(matrix)

# Select specific elements
rows = [0, 1, 2, 3]
cols = [0, 1, 2, 3]
diagonal = matrix[rows, cols]
print(f"\nDiagonal elements: {diagonal}")

# Select multiple specific elements
rows = [0, 1, 3]
cols = [1, 2, 0]
specific = matrix[rows, cols]
print(f"Elements at (0,1), (1,2), (3,0): {specific}")

### ðŸŽ¯ Practice Exercise 2: Indexing and Slicing

Practice array manipulation using various indexing techniques:

In [None]:
# Exercise: Array Manipulation
# Given a 6x6 matrix of random integers (0-100):
# 1. Extract the 3x3 center submatrix
# 2. Get all elements on the main diagonal
# 3. Find all elements greater than 75
# 4. Set all elements in the corners to -1

# Create the matrix
np.random.seed(10)
matrix = np.random.randint(0, 100, size=(6, 6))
print("Original matrix:")
print(matrix)

# Your code here:
# TODO: Extract center 3x3
# TODO: Get diagonal elements
# TODO: Find elements > 75
# TODO: Set corners to -1


# Test your solution (uncomment when ready):
# print("\nCenter 3x3:")
# print(center)
# print("\nDiagonal:", diagonal)
# print("\nElements > 75:", large_elements)
# print("\nMatrix with -1 corners:")
# print(matrix)

## Part 3: Array Operations and Broadcasting

### Vectorized Operations

NumPy's true power comes from vectorization - performing operations on entire arrays without writing explicit loops:

In [None]:
# Element-wise operations
a = np.array([1, 2, 3, 4, 5])
b = np.array([10, 20, 30, 40, 50])

print(f"Array a: {a}")
print(f"Array b: {b}")
print(f"\nAddition (a + b): {a + b}")
print(f"Subtraction (b - a): {b - a}")
print(f"Multiplication (a * b): {a * b}")
print(f"Division (b / a): {b / a}")
print(f"Power (a ** 2): {a ** 2}")
print(f"Modulo (b % a): {b % a}")

In [None]:
# Operations with scalars
arr = np.array([1, 2, 3, 4, 5])

print(f"Original array: {arr}")
print(f"\nAdd 10: {arr + 10}")
print(f"Multiply by 2: {arr * 2}")
print(f"Divide by 2: {arr / 2}")
print(f"Square root: {np.sqrt(arr)}")
print(f"Exponential: {np.exp(arr)}")
print(f"Natural log: {np.log(arr)}")

### Understanding Broadcasting

Broadcasting is NumPy's powerful mechanism for performing operations on arrays of different shapes. It follows specific rules to "broadcast" smaller arrays across larger ones.

In [None]:
# Broadcasting scalar to array
arr = np.array([[1, 2, 3],
                [4, 5, 6],
                [7, 8, 9]])

print("Original array:")
print(arr)
print("\nAdd 10 (scalar broadcasts to all elements):")
print(arr + 10)

In [None]:
# Broadcasting 1D array to 2D array
matrix = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])

row_vector = np.array([10, 20, 30])
col_vector = np.array([[100], [200], [300]])

print("Matrix shape:", matrix.shape)
print("Row vector shape:", row_vector.shape)
print("Column vector shape:", col_vector.shape)

print("\nMatrix:")
print(matrix)
print("\nRow vector:", row_vector)
print("\nMatrix + row vector (broadcasts across rows):")
print(matrix + row_vector)

print("\nColumn vector:")
print(col_vector)
print("\nMatrix + column vector (broadcasts across columns):")
print(matrix + col_vector)

In [None]:
# Broadcasting rules demonstration
print("Broadcasting Rules:")
print("1. Arrays are compatible if dimensions are equal or one is 1")
print("2. Broadcasting happens from right to left\n")

# Example 1: Compatible shapes
a = np.ones((3, 4))  # Shape: (3, 4)
b = np.ones(4)       # Shape: (4,) -> broadcasts to (3, 4)
result = a + b
print(f"Shape (3,4) + Shape (4,) = Shape {result.shape}")

# Example 2: More complex broadcasting
a = np.ones((3, 1, 4))  # Shape: (3, 1, 4)
b = np.ones((1, 5, 4))  # Shape: (1, 5, 4)
result = a + b           # Broadcasts to (3, 5, 4)
print(f"Shape (3,1,4) + Shape (1,5,4) = Shape {result.shape}")

# Example 3: Incompatible shapes (would raise error)
print("\nIncompatible example:")
print("Shape (3,4) + Shape (3,) would fail")
print("(Last dimensions 4 and 3 are incompatible)")

### Matrix Operations

In [None]:
# Matrix multiplication
A = np.array([[1, 2],
              [3, 4]])

B = np.array([[5, 6],
              [7, 8]])

print("Matrix A:")
print(A)
print("\nMatrix B:")
print(B)

# Element-wise multiplication
print("\nElement-wise multiplication (A * B):")
print(A * B)

# Matrix multiplication (dot product)
print("\nMatrix multiplication (A @ B):")
print(A @ B)

# Alternative: np.dot()
print("\nUsing np.dot(A, B):")
print(np.dot(A, B))

# Transpose
print("\nA transpose:")
print(A.T)

# Matrix inverse
print("\nA inverse:")
print(np.linalg.inv(A))

# Verify: A * A_inv = Identity
print("\nA @ A_inverse (should be identity):")
print(np.round(A @ np.linalg.inv(A), 10))

### Aggregation Functions

In [None]:
# Create sample data
data = np.random.randn(4, 5) * 10 + 50
print("Sample data (4x5 matrix):")
print(data.round(2))

# Basic aggregations
print(f"\nSum of all elements: {data.sum():.2f}")
print(f"Mean: {data.mean():.2f}")
print(f"Standard deviation: {data.std():.2f}")
print(f"Variance: {data.var():.2f}")
print(f"Minimum: {data.min():.2f}")
print(f"Maximum: {data.max():.2f}")

# Aggregations along axes
print(f"\nSum along rows (axis=1): {data.sum(axis=1).round(2)}")
print(f"Sum along columns (axis=0): {data.sum(axis=0).round(2)}")
print(f"Mean of each row: {data.mean(axis=1).round(2)}")
print(f"Mean of each column: {data.mean(axis=0).round(2)}")

In [None]:
# Finding indices of min/max
arr = np.array([45, 23, 67, 89, 12, 34, 56, 78])
print(f"Array: {arr}")
print(f"\nMinimum value: {arr.min()}")
print(f"Index of minimum: {arr.argmin()}")
print(f"Maximum value: {arr.max()}")
print(f"Index of maximum: {arr.argmax()}")

# Cumulative operations
print(f"\nCumulative sum: {arr.cumsum()}")
print(f"Cumulative product: {arr.cumprod()}")

# Sorting
print(f"\nSorted array: {np.sort(arr)}")
print(f"Indices that would sort: {np.argsort(arr)}")
print(f"Original array unchanged: {arr}")

### ðŸŽ¯ Practice Exercise 3: Operations and Broadcasting

Apply operations and broadcasting concepts:

In [None]:
# Exercise: Data Normalization
# Given a matrix of test scores:
# 1. Normalize each column to have mean=0, std=1 (z-score)
# 2. Scale each row to range [0, 100]
# 3. Calculate the correlation between columns

# Create test score data (students x subjects)
np.random.seed(42)
scores = np.random.randint(60, 100, size=(10, 4))
print("Original scores (10 students, 4 subjects):")
print(scores)

# Your code here:
# TODO: Calculate z-scores for each column
# TODO: Scale each row to [0, 100]
# TODO: Calculate correlation matrix


# Test your solution (uncomment when ready):
# print("\nZ-score normalized (by column):")
# print(z_scores.round(2))
# print("\nScaled to [0, 100] (by row):")
# print(scaled.round(0))
# print("\nCorrelation matrix:")
# print(correlation.round(3))

## Part 4: Statistical Functions

### Basic Statistics with NumPy

NumPy provides comprehensive statistical functions for data analysis:

In [None]:
# Generate sample data
np.random.seed(42)
data = np.random.randn(1000) * 15 + 100  # Normal dist: mean=100, std=15

print("Statistical Analysis of 1000 data points:")
print("="*40)
print(f"Mean: {data.mean():.2f}")
print(f"Median: {np.median(data):.2f}")
print(f"Standard Deviation: {data.std():.2f}")
print(f"Variance: {data.var():.2f}")
print(f"Minimum: {data.min():.2f}")
print(f"Maximum: {data.max():.2f}")
print(f"Range: {data.max() - data.min():.2f}")

# Percentiles
print("\nPercentiles:")
print(f"25th percentile (Q1): {np.percentile(data, 25):.2f}")
print(f"50th percentile (Median): {np.percentile(data, 50):.2f}")
print(f"75th percentile (Q3): {np.percentile(data, 75):.2f}")
print(f"IQR (Q3 - Q1): {np.percentile(data, 75) - np.percentile(data, 25):.2f}")

# Multiple percentiles at once
percentiles = [5, 25, 50, 75, 95]
values = np.percentile(data, percentiles)
print("\nPercentile Distribution:")
for p, v in zip(percentiles, values):
    print(f"  {p:3d}%: {v:6.2f}")

### Statistics on Multi-dimensional Arrays

In [None]:
# Create 2D data (e.g., measurements over time)
np.random.seed(42)
measurements = np.random.randint(50, 150, size=(5, 7))  # 5 sensors, 7 days
sensor_names = ['Sensor A', 'Sensor B', 'Sensor C', 'Sensor D', 'Sensor E']
days = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']

print("Sensor measurements (5 sensors Ã— 7 days):")
print("Days:", ' '.join(f"{d:>5}" for d in days))
for i, name in enumerate(sensor_names):
    print(f"{name}: {measurements[i]}")

# Statistics across different axes
print("\nStatistics by axis:")
print("Overall mean:", measurements.mean().round(2))
print("\nMean per sensor (across days):")
for name, mean in zip(sensor_names, measurements.mean(axis=1)):
    print(f"  {name}: {mean:.2f}")

print("\nMean per day (across sensors):")
for day, mean in zip(days, measurements.mean(axis=0)):
    print(f"  {day}: {mean:.2f}")

print("\nMax reading per sensor:", measurements.max(axis=1))
print("Min reading per day:", measurements.min(axis=0))

### Correlation and Covariance

In [None]:
# Create correlated data
np.random.seed(42)
n_points = 100

# Variable 1: Random normal
x = np.random.randn(n_points)

# Variable 2: Strongly correlated with x
y = 2 * x + np.random.randn(n_points) * 0.5

# Variable 3: Weakly correlated with x
z = 0.5 * x + np.random.randn(n_points) * 2

# Variable 4: Uncorrelated
w = np.random.randn(n_points)

# Stack into matrix
data = np.column_stack([x, y, z, w])
var_names = ['X', 'Y (strong)', 'Z (weak)', 'W (none)']

# Calculate correlation matrix
correlation = np.corrcoef(data.T)

print("Correlation Matrix:")
print("     ", '  '.join(f"{name:>10}" for name in var_names))
for i, name in enumerate(var_names):
    print(f"{name:>5}:", ' '.join(f"{correlation[i, j]:10.3f}" for j in range(4)))

print("\nInterpretation:")
print("- Values close to 1: Strong positive correlation")
print("- Values close to -1: Strong negative correlation")
print("- Values close to 0: No linear correlation")

# Covariance matrix
covariance = np.cov(data.T)
print("\nCovariance Matrix (diagonal = variance):")
for i in range(4):
    print(f"{var_names[i]} variance: {covariance[i, i]:.3f}")

## Part 5: Performance - Why NumPy is Fast

### Comparing NumPy vs Pure Python

Let's demonstrate why NumPy is so much faster than pure Python:

In [None]:
import time

# Define functions for comparison
def python_sum_squares(n):
    """Calculate sum of squares using Python list"""
    numbers = list(range(n))
    result = 0
    for num in numbers:
        result += num ** 2
    return result

def numpy_sum_squares(n):
    """Calculate sum of squares using NumPy"""
    numbers = np.arange(n)
    return (numbers ** 2).sum()

# Test with different sizes
sizes = [1000, 10000, 100000, 1000000]

print("Performance Comparison: Sum of Squares")
print("="*50)
print(f"{'Size':>10} {'Python (ms)':>15} {'NumPy (ms)':>15} {'Speedup':>10}")
print("-"*50)

for n in sizes:
    # Time Python version
    start = time.time()
    python_result = python_sum_squares(n)
    python_time = (time.time() - start) * 1000  # Convert to ms
    
    # Time NumPy version
    start = time.time()
    numpy_result = numpy_sum_squares(n)
    numpy_time = (time.time() - start) * 1000  # Convert to ms
    
    # Calculate speedup
    speedup = python_time / numpy_time if numpy_time > 0 else float('inf')
    
    print(f"{n:>10,} {python_time:>15.2f} {numpy_time:>15.2f} {speedup:>9.1f}x")
    
    # Verify results are the same
    assert python_result == numpy_result, "Results don't match!"

### Memory Efficiency

In [None]:
import sys

# Compare memory usage
n = 1000

# Python list
py_list = list(range(n))
py_size = sys.getsizeof(py_list)

# NumPy array
np_array = np.arange(n)
np_size = np_array.nbytes

print("Memory Usage Comparison (1000 integers):")
print("="*40)
print(f"Python list: {py_size:,} bytes")
print(f"NumPy array: {np_size:,} bytes")
print(f"Memory savings: {(1 - np_size/py_size)*100:.1f}%")

print("\nWhy NumPy uses less memory:")
print("- Contiguous memory allocation")
print("- Fixed data type (no type checking)")
print("- No Python object overhead")
print("- Efficient C arrays under the hood")

### Vectorization Example

Let's see how vectorization eliminates the need for explicit loops:

In [None]:
# Task: Calculate distance from origin for 2D points
n_points = 100000
np.random.seed(42)

# Generate random 2D points
points = np.random.randn(n_points, 2)

# Python approach with loop
def python_distances(points_list):
    distances = []
    for point in points_list:
        dist = (point[0]**2 + point[1]**2)**0.5
        distances.append(dist)
    return distances

# NumPy vectorized approach
def numpy_distances(points_array):
    return np.sqrt(points_array[:, 0]**2 + points_array[:, 1]**2)

# Even better: use np.linalg.norm
def numpy_distances_optimized(points_array):
    return np.linalg.norm(points_array, axis=1)

# Compare performance
points_list = points.tolist()

start = time.time()
py_dist = python_distances(points_list)
py_time = time.time() - start

start = time.time()
np_dist = numpy_distances(points)
np_time = time.time() - start

start = time.time()
np_opt_dist = numpy_distances_optimized(points)
np_opt_time = time.time() - start

print(f"Distance calculation for {n_points:,} points:")
print("="*50)
print(f"Python loop: {py_time:.4f} seconds")
print(f"NumPy vectorized: {np_time:.4f} seconds ({py_time/np_time:.1f}x faster)")
print(f"NumPy optimized: {np_opt_time:.4f} seconds ({py_time/np_opt_time:.1f}x faster)")

print("\nKey Insight: Vectorization eliminates Python loops,")
print("allowing operations to run at C speed!")

### ðŸŽ¯ Practice Exercise 4: Performance Optimization

Optimize code using NumPy vectorization:

In [None]:
# Exercise: Image Processing Simulation
# Given a grayscale image (2D array), apply these transformations:
# 1. Increase brightness by 20%
# 2. Apply threshold: values > 200 become 255, others unchanged
# 3. Invert the image (255 - pixel_value)
# Compare loop-based vs vectorized approaches

# Create simulated image
np.random.seed(42)
image = np.random.randint(0, 256, size=(100, 100), dtype=np.uint8)

# Your code here:
# TODO: Implement process_image_loop() using loops
# TODO: Implement process_image_vectorized() using NumPy
# TODO: Time both approaches


# Test your solution (uncomment when ready):
# loop_result = process_image_loop(image.copy())
# vector_result = process_image_vectorized(image.copy())
# print("Results match:", np.array_equal(loop_result, vector_result))
# print(f"Loop time: {loop_time:.4f}s")
# print(f"Vectorized time: {vector_time:.4f}s")
# print(f"Speedup: {loop_time/vector_time:.1f}x")

## Summary and Best Practices

### What We've Learned

Congratulations! You've mastered the fundamentals of NumPy:

1. **Array Creation**: Multiple methods to create and initialize arrays
2. **Indexing and Slicing**: Powerful techniques for data access and manipulation
3. **Broadcasting**: Smart operations on arrays of different shapes
4. **Vectorization**: Eliminating loops for massive performance gains
5. **Statistical Functions**: Built-in tools for data analysis

### NumPy Best Practices

1. **Always vectorize**: Avoid Python loops when possible
2. **Use appropriate dtypes**: Choose the right data type for memory efficiency
3. **Leverage broadcasting**: Understand the rules to write cleaner code
4. **Preallocate arrays**: Create arrays of the final size when possible
5. **Use views, not copies**: Understand when operations create views vs copies
6. **Profile your code**: Measure performance to find bottlenecks

### Common Pitfalls to Avoid

- **Modifying views**: Be aware that slices are views, not copies
- **Shape mismatches**: Understand broadcasting rules to avoid errors
- **Memory issues**: Large arrays can consume significant memory
- **Type conversions**: Implicit conversions can lose precision

### Real-World Applications

NumPy is the foundation for:
- **Data Science**: pandas is built on NumPy
- **Machine Learning**: scikit-learn uses NumPy arrays
- **Deep Learning**: TensorFlow and PyTorch interoperate with NumPy
- **Image Processing**: Images are NumPy arrays
- **Signal Processing**: Audio and sensor data processing
- **Scientific Computing**: Simulations and numerical methods

### What's Next?

Now that you understand NumPy, you're ready to:
- **Learn pandas**: For structured data analysis
- **Explore matplotlib**: For data visualization
- **Study scikit-learn**: For machine learning
- **Practice linear algebra**: Matrix operations for ML

### Final Challenge

Implement a simple neural network layer using only NumPy:

In [None]:
# Final Challenge: Neural Network Layer
# Implement a single neural network layer that:
# 1. Takes input of shape (batch_size, input_features)
# 2. Has weights of shape (input_features, output_features)
# 3. Has bias of shape (output_features,)
# 4. Applies: output = activation(input @ weights + bias)
# 5. Uses ReLU activation: max(0, x)

def neural_layer(input_data, weights, bias):
    """
    Implement a single neural network layer.
    
    Parameters:
    -----------
    input_data : array of shape (batch_size, input_features)
    weights : array of shape (input_features, output_features)
    bias : array of shape (output_features,)
    
    Returns:
    --------
    output : array of shape (batch_size, output_features)
    """
    # TODO: Implement the forward pass
    pass

# Test your implementation (uncomment when ready):
# np.random.seed(42)
# X = np.random.randn(32, 10)  # 32 samples, 10 features
# W = np.random.randn(10, 5)   # Transform to 5 features
# b = np.random.randn(5)        # Bias for 5 features
# output = neural_layer(X, W, b)
# print(f"Input shape: {X.shape}")
# print(f"Output shape: {output.shape}")
# print(f"Output (first 3 samples, first 3 features):")
# print(output[:3, :3].round(3))

---

## Resources for Further Learning

- **NumPy Documentation**: https://numpy.org/doc/stable/
- **NumPy Tutorial**: https://numpy.org/doc/stable/user/absolute_beginners.html
- **100 NumPy Exercises**: https://github.com/rougier/numpy-100
- **NumPy for MATLAB Users**: https://numpy.org/doc/stable/user/numpy-for-matlab-users.html
- **SciPy Lectures**: https://scipy-lectures.org/

Remember: NumPy is the foundation of scientific Python. Master it, and you'll have the tools to tackle any numerical computing challenge!

Happy computing! ðŸš€ðŸ”¢