Welcome to Module 7 of our comprehensive Python course, designed to transform you from a beginner to an advanced Python programmer!
In Module 6, we explored advanced concepts like iterators, decorators, and AsyncIO, equipping you with tools for high-performance coding. Now, we dive into data structures and libraries, the backbone of efficient programming and data analysis. This module covers stacks, queues, linked lists, dictionaries in depth, sets and frozen sets, NumPy basics, Pandas for data handling, and Matplotlib/Seaborn for visualization. These topics are crucial for building robust applications, from task managers to data dashboards.
Table of Contents
- Stacks, Queues, Linked Lists
- Understanding Stacks, Queues, and Linked Lists
- Implementing with Python
- Real-World Applications
- Pros, Cons, and Alternatives
- Best Practices
- Example: Building a Task History Manager
- Dictionaries in Depth
- Advanced Dictionary Operations
- OrderedDict and DefaultDict
- Pros, Cons, and Alternatives
- Best Practices
- Example: Creating a Word Frequency Counter
- Sets & Frozen Sets
- Set Operations and Use Cases
- Frozen Sets for Immutability
- Pros, Cons, and Alternatives
- Best Practices
- Example: Managing Unique User IDs
- NumPy Basics
- Introduction to NumPy Arrays
- Array Operations and Broadcasting
- Pros, Cons, and Alternatives
- Best Practices
- Example: Analyzing Stock Prices
- Pandas for Data Handling
- DataFrames and Series
- Data Manipulation and Analysis
- Pros, Cons, and Alternatives
- Best Practices
- Example: Processing Sales Data
- Matplotlib/Seaborn for Visualization
- Creating Plots with Matplotlib
- Enhancing Visualizations with Seaborn
- Pros, Cons, and Alternatives
- Best Practices
- Example: Visualizing Sales Trends
- Conclusion & Next Steps
1. Stacks, Queues, Linked ListsUnderstanding Stacks, Queues, and Linked ListsData structures organize and store data efficiently:
- Stacks: Last-In-First-Out (LIFO) structure, like a stack of plates.
- Queues: First-In-First-Out (FIFO) structure, like a line at a store.
- Linked Lists: Nodes linked by pointers, ideal for dynamic data.
stack = []
stack.append(1) # Push
stack.append(2)
print(stack.pop()) # Pop: 2
from collections import deque
queue = deque()
queue.append(1) # Enqueue
queue.append(2)
print(queue.popleft()) # Dequeue: 1
class Node:
def __init__(self, data):
self.data = data
self.next = None
class LinkedList:
def __init__(self):
self.head = None
def append(self, data):
new_node = Node(data)
if not self.head:
self.head = new_node
return
current = self.head
while current.next:
current = current.next
current.next = new_node
- Stacks: Undo/redo functionality, browser history.
- Queues: Task scheduling, print queues.
- Linked Lists: Playlists, file systems.
- Stacks and queues are simple and efficient for specific tasks.
- Linked lists allow dynamic resizing and efficient insertions.
- Python’s collections.deque optimizes queue operations.
- Stacks and queues have limited use cases.
- Linked lists are slower for random access compared to lists.
- Manual linked list implementation is error-prone.
- Lists: For general-purpose sequences, but less efficient for queues.
- Arrays (NumPy): For numerical data with fixed size.
- Third-Party Libraries: Like llist for linked lists.
- Use deque for stacks and queues instead of lists.
- Implement linked lists only when dynamic insertion/deletion is needed.
- Ensure proper memory management in linked lists (e.g., avoid cycles).
- Test edge cases (e.g., empty structures).
from collections import deque
class TaskManager:
def __init__(self):
self.history = deque() # Stack for undo
def add_task(self, task):
self.history.append(task)
return f"Added task: {task}"
def undo(self):
if self.history:
task = self.history.pop()
return f"Undid task: {task}"
return "No tasks to undo."
# Test the manager
manager = TaskManager()
print(manager.add_task("Write report")) # Output: Added task: Write report
print(manager.add_task("Send email")) # Output: Added task: Send email
print(manager.undo()) # Output: Undid task: Send email
class TaskNode:
def __init__(self, task):
self.task = task
self.prev = None
class AdvancedTaskManager:
def __init__(self):
self.head = None
def add_task(self, task):
new_node = TaskNode(task)
new_node.prev = self.head
self.head = new_node
return f"Added task: {task}"
def undo(self):
if self.head:
task = self.head.task
self.head = self.head.prev
return f"Undid task: {task}"
return "No tasks to undo."
# Test the advanced manager
manager = AdvancedTaskManager()
print(manager.add_task("Write report")) # Output: Added task: Write report
print(manager.add_task("Send email")) # Output: Added task: Send email
print(manager.undo()) # Output: Undid task: Send email
2. Dictionaries in DepthAdvanced Dictionary OperationsDictionaries store key-value pairs, supporting:
- Access: dict[key]
- Update: dict[key] = value
- Iteration: dict.items(), dict.keys(), dict.values()
- Merging: dict1 | dict2 (Python 3.9+)
user = {"name": "Alice", "age": 30}
user["email"] = "alice@example.com"
print(user.items()) # Output: dict_items([('name', 'Alice'), ('age', 30), ('email', 'alice@example.com')])
- OrderedDict (collections.OrderedDict): Maintains insertion order (unnecessary in Python 3.7+).
- DefaultDict (collections.defaultdict): Provides default values for missing keys.
from collections import defaultdict
word_count = defaultdict(int)
text = "apple banana apple"
for word in text.split():
word_count[word] += 1
print(word_count) # Output: defaultdict(<class 'int'>, {'apple': 2, 'banana': 1})
- Fast key-based access (O(1) average).
- Flexible for storing structured data.
- defaultdict simplifies missing key handling.
- Memory overhead compared to lists.
- Keys must be hashable.
- Not ideal for ordered sequences.
- Lists/Tuples: For ordered data.
- Sets: For unique keys without values.
- Custom Classes: For complex data structures.
- Use descriptive keys for readability.
- Use defaultdict for counting or grouping.
- Avoid mutable default values in defaultdict.
- Use dictionary comprehension for concise creation.
from collections import defaultdict
import re
def word_frequency(text):
"""Count word frequencies in text."""
words = re.findall(r'\w+', text.lower())
freq = defaultdict(int)
for word in words:
freq[word] += 1
return dict(freq)
# Test the counter
text = "The quick brown fox jumps over the lazy dog. The fox is quick."
print(word_frequency(text)) # Output: {'the': 2, 'quick': 2, 'brown': 1, 'fox': 2, 'jumps': 1, 'over': 1, 'lazy': 1, 'dog': 1, 'is': 1}
def group_by_length(text):
"""Group words by their length."""
words = re.findall(r'\w+', text.lower())
groups = defaultdict(list)
for word in words:
groups[len(word)].append(word)
return dict(groups)
# Test grouping
print(group_by_length(text)) # Output: {3: ['the', 'fox', 'the'], 4: ['over', 'lazy'], 5: ['quick', 'brown', 'jumps', 'quick'], 3: ['dog', 'is']}
3. Sets & Frozen SetsSet Operations and Use CasesSets store unique, hashable items, supporting:
- Union: set1 | set2
- Intersection: set1 & set2
- Difference: set1 - set2
- Add/Remove: set.add(), set.remove()
set1 = {1, 2, 3}
set2 = {2, 3, 4}
print(set1 | set2) # Output: {1, 2, 3, 4}
frozen = frozenset([1, 2, 3])
- Fast membership testing (O(1) average).
- Efficient for unique data and set operations.
- Frozen sets enable immutability.
- No indexing or ordering.
- Limited to hashable elements.
- Less flexible than lists or dictionaries.
- Lists: For ordered, non-unique data.
- Dictionaries: For key-value pairs.
- NumPy Arrays: For numerical sets.
- Use sets for unique data or membership testing.
- Use frozen sets for dictionary keys or immutable data.
- Avoid modifying sets during iteration.
- Use set comprehension for concise creation.
def manage_users(new_users, existing_users):
"""Add new users, ensuring uniqueness."""
existing = set(existing_users)
new = set(new_users)
added = new - existing
existing.update(added)
return list(existing)
# Test user management
existing = ["user1", "user2"]
new = ["user2", "user3", "user4"]
print(manage_users(new, existing)) # Output: ['user1', 'user2', 'user3', 'user4']
def cache_results(inputs):
cache = {}
for input_set in inputs:
frozen = frozenset(input_set)
if frozen not in cache:
cache[frozen] = sum(input_set)
return cache
# Test caching
inputs = [[1, 2], [2, 1], [3, 4]]
print(cache_results(inputs)) # Output: {frozenset({1, 2}): 3, frozenset({3, 4}): 7}
4. NumPy BasicsIntroduction to NumPy ArraysNumPy provides efficient arrays for numerical computations, supporting:
- Creation: np.array(), np.zeros(), np.ones()
- Operations: Element-wise arithmetic, matrix operations
- Broadcasting: Apply operations across arrays
import numpy as np
arr = np.array([1, 2, 3])
print(arr + 2) # Output: [3 4 5]
matrix = np.array([[1, 2], [3, 4]])
print(matrix * 2) # Output: [[2 4], [6 8]]
- Fast, vectorized operations for numerical data.
- Supports multidimensional arrays.
- Broadcasting simplifies operations.
- Overhead for small datasets.
- Requires installation (not built-in).
- Less intuitive for non-numerical data.
- Lists: For small, non-numerical data.
- Pandas: For tabular data with labels.
- SciPy: For advanced scientific computations.
- Use NumPy for numerical computations, not general-purpose lists.
- Leverage broadcasting to avoid loops.
- Use np.vectorize for custom functions on arrays.
- Check array shapes to avoid broadcasting errors.
import numpy as np
def analyze_stocks(prices):
"""Calculate stock metrics."""
prices = np.array(prices)
returns = np.diff(prices) / prices[:-1]
return {
"mean_price": np.mean(prices),
"volatility": np.std(returns)
}
# Test analysis
prices = [100, 102, 101, 105, 103]
print(analyze_stocks(prices)) # Output: {'mean_price': 102.2, 'volatility': 0.016...}
5. Pandas for Data HandlingDataFrames and SeriesPandas provides DataFrames (tables) and Series (columns) for data manipulation:
- DataFrame: 2D labeled data structure.
- Series: 1D labeled array.
import pandas as pd
df = pd.DataFrame({
"name": ["Alice", "Bob"],
"age": [30, 25]
})
print(df) # Output: name age
# 0 Alice 30
# 1 Bob 25
- Filtering: df[df['age'] > 25]
- Grouping: df.groupby('column')
- Merging: pd.merge(df1, df2)
- Intuitive for tabular data.
- Powerful for data cleaning and analysis.
- Integrates with NumPy and visualization libraries.
- Memory-intensive for large datasets.
- Steeper learning curve than lists/dictionaries.
- Requires installation.
- NumPy: For numerical data without labels.
- Dask: For big data processing.
- SQL: For database-style operations.
- Use vectorized operations instead of loops.
- Handle missing data with fillna() or dropna().
- Use meaningful column names.
- Save DataFrames to CSV or Parquet for persistence.
import pandas as pd
def analyze_sales(data):
df = pd.DataFrame(data)
df["date"] = pd.to_datetime(df["date"])
summary = df.groupby("product")["price"].agg(["sum", "count"])
return summary
# Test analysis
sales = [
{"product": "Laptop", "price": 999.99, "date": "2025-08-18"},
{"product": "Mouse", "price": 29.99, "date": "2025-08-18"},
{"product": "Laptop", "price": 999.99, "date": "2025-08-19"}
]
print(analyze_sales(sales))
sum count
product
Laptop 1999.98 2
Mouse 29.99 1
6. Matplotlib/Seaborn for VisualizationCreating Plots with MatplotlibMatplotlib creates customizable plots:
- Line Plots: plt.plot()
- Bar Charts: plt.bar()
- Scatter Plots: plt.scatter()
import matplotlib.pyplot as plt
x = [1, 2, 3, 4]
y = [10, 20, 25, 30]
plt.plot(x, y)
plt.title("Line Plot")
plt.show()
- Heatmaps: sns.heatmap()
- Box Plots: sns.boxplot()
import seaborn as sns
import pandas as pd
df = pd.DataFrame({"x": [1, 2, 3], "y": [10, 20, 25]})
sns.scatterplot(data=df, x="x", y="y")
plt.show()
- Matplotlib is highly customizable.
- Seaborn simplifies complex statistical plots.
- Integrates with Pandas and NumPy.
- Matplotlib has a steep learning curve for customization.
- Seaborn is less flexible for non-statistical plots.
- Requires installation.
- Plotly: For interactive plots.
- Bokeh: For web-based visualizations.
- Altair: For declarative visualizations.
- Use Seaborn for quick, aesthetic plots.
- Customize Matplotlib for specific needs.
- Save plots to files (plt.savefig()).
- Use meaningful titles, labels, and legends.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
def visualize_sales(data):
df = pd.DataFrame(data)
df["date"] = pd.to_datetime(df["date"])
df = df.groupby("date")["price"].sum().reset_index()
plt.figure(figsize=(10, 5))
sns.lineplot(data=df, x="date", y="price")
plt.title("Daily Sales Trend")
plt.xlabel("Date")
plt.ylabel("Total Sales ($)")
plt.savefig("sales_trend.png")
plt.show()
# Test visualization
sales = [
{"product": "Laptop", "price": 999.99, "date": "2025-08-18"},
{"product": "Mouse", "price": 29.99, "date": "2025-08-18"},
{"product": "Laptop", "price": 999.99, "date": "2025-08-19"}
]
visualize_sales(sales)
7. Conclusion & Next StepsCongratulations on mastering Module 7! You’ve learned essential data structures (stacks, queues, linked lists, dictionaries, sets) and powerful libraries (NumPy, Pandas, Matplotlib, Seaborn) for building efficient, data-driven applications like task managers, word counters, user ID trackers, stock analyzers, sales processors, and visualizations.Next Steps:
- Practice: Enhance the examples (e.g., add features to the sales visualizer).
- Explore: Dive into advanced libraries like SciPy or Plotly.
- Advance: Move to Module 8, covering APIs, databases, and testing.
- Resources:
- Python Documentation: python.org/doc
- PEP 8 Style Guide: pep8.org
- Practice on LeetCode, HackerRank, or Kaggle.
0 comments:
Post a Comment