Section 1:

Essential Jupyter Techniques for ML Practitioners

To use Jupyter effectively for machine learning work, consider these advanced techniques:

Magic Commands: Use special commands like %time and %memit to profile your code's execution time and memory usage—crucial information when working with large datasets or complex models.

# Time execution of model training
%time model.fit(X_train, y_train)

# Compare performance of different approaches
%timeit np.dot(A, B)  # Matrix multiplication with NumPy
%timeit A @ B         # Matrix multiplication with Python operator

import random
L = [random.random() for i in range(100000)]
print("sorting an unsorted list:")
%time L.sort()

sorting an unsorted list:
CPU times: user 40.6 ms, sys: 896 µs, total: 41.5 ms
Wall time: 41.5 ms

Interactive Widgets: Implement interactive controls using libraries like ipywidgets to create dynamic visualizations or to experiment with hyperparameters without modifying code.

from ipywidgets import interact
import ipywidgets as widgets

@interact(n_estimators=widgets.IntSlider(min=10, max=200, step=10, value=100),
          max_depth=widgets.IntSlider(min=1, max=20, step=1, value=None))
def train_and_evaluate(n_estimators, max_depth):
    model = RandomForestClassifier(n_estimators=n_estimators,
                                  max_depth=max_depth,
                                  random_state=42)
    model.fit(X_train, y_train)
    predictions = model.predict(X_test)
    print(classification_report(y_test, predictions))

Environment Management: Use %conda and %pip magic commands to manage package installations directly from your notebook, helping ensure reproducibility.

# Install a package directly from the notebook
%pip install lightgbm

# Or with conda
%conda install -c conda-forge lightgbm

Cell Output Management: For long-running training processes, consider using progress bars from libraries like tqdm to monitor execution, or set display.max_rows and display.max_columns to control how much data is shown.

import pandas as pd
from tqdm.notebook import tqdm

# Control output display
pd.set_option('display.max_columns', 20)
pd.set_option('display.max_rows', 100)

# Create a progress bar for long operations
for i in tqdm(range(100)):
    # Perform some time-consuming operation
    pass

Bridging Exploration and Production

While Jupyter excels at exploratory work, bridging the gap to production requires careful consideration:

Some teams use tools like nbconvert Convert Notebooks to other formats to extract production-ready code from notebooks, converting exploratory code into Python modules that can be imported and used in production systems.

enables:

presentation of information in familiar formats, such as PDF.

publishing of research using LaTeX and opens the door for embedding notebooks in papers.

collaboration with others who may not use the notebook in their work.

sharing contents with many people via the web using HTML.

# Convert notebook to Python script
jupyter nbconvert --to script my_analysis.ipynb