To use Jupyter effectively for machine learning work, consider these advanced techniques:
Magic Commands: Use special commands like %time
and %memit
to profile your code's execution time and memory usage—crucial information when working with large datasets or complex models.
# Time execution of model training
%time model.fit(X_train, y_train)
# Compare performance of different approaches
%timeit np.dot(A, B) # Matrix multiplication with NumPy
%timeit A @ B # Matrix multiplication with Python operator
import random
L = [random.random() for i in range(100000)]
print("sorting an unsorted list:")
%time L.sort()
sorting an unsorted list:
CPU times: user 40.6 ms, sys: 896 µs, total: 41.5 ms
Wall time: 41.5 ms
Interactive Widgets: Implement interactive controls using libraries like ipywidgets
to create dynamic visualizations or to experiment with hyperparameters without modifying code.
from ipywidgets import interact
import ipywidgets as widgets
@interact(n_estimators=widgets.IntSlider(min=10, max=200, step=10, value=100),
max_depth=widgets.IntSlider(min=1, max=20, step=1, value=None))
def train_and_evaluate(n_estimators, max_depth):
model = RandomForestClassifier(n_estimators=n_estimators,
max_depth=max_depth,
random_state=42)
model.fit(X_train, y_train)
predictions = model.predict(X_test)
print(classification_report(y_test, predictions))
Environment Management: Use %conda
and %pip
magic commands to manage package installations directly from your notebook, helping ensure reproducibility.
# Install a package directly from the notebook
%pip install lightgbm
# Or with conda
%conda install -c conda-forge lightgbm
Cell Output Management: For long-running training processes, consider using progress bars from libraries like tqdm
to monitor execution, or set display.max_rows
and display.max_columns
to control how much data is shown.
import pandas as pd
from tqdm.notebook import tqdm
# Control output display
pd.set_option('display.max_columns', 20)
pd.set_option('display.max_rows', 100)
# Create a progress bar for long operations
for i in tqdm(range(100)):
# Perform some time-consuming operation
pass
While Jupyter excels at exploratory work, bridging the gap to production requires careful consideration:
Some teams use tools like nbconvert
Convert Notebooks to other formats to extract production-ready code from notebooks, converting exploratory code into Python modules that can be imported and used in production systems.
enables:
presentation of information in familiar formats, such as PDF.
publishing of research using LaTeX and opens the door for embedding notebooks in papers.
collaboration with others who may not use the notebook in their work.
sharing contents with many people via the web using HTML.
# Convert notebook to Python script
jupyter nbconvert --to script my_analysis.ipynb