🔹 Decision Tree
Definition: A non-parametric algorithm for both classification and regression. It uses a tree-like model of decisions and their possible consequences.
Key Terminologies
- Root Node: Starting point of the tree that contains all data.
- Decision Nodes: Nodes that split data based on a feature condition.
- Leaf Nodes: Terminal nodes that represent the final prediction.
- Sub-Tree: A smaller section of the entire decision tree.
- Pruning: Reducing tree size by removing nodes to prevent overfitting.
- Parent and Child Nodes: Relationship between a node and the nodes derived from it.
Entropy
- Measures the uncertainty or impurity of the dataset.
Information Gain
- Measures the reduction in entropy achieved by partitioning the data on a feature.
from sklearn.tree import DecisionTreeClassifier
dt_model = DecisionTreeClassifier()
dt_model.fit(X_train, y_train)
y_pred_dt = dt_model.predict(X_test)
🔹 Random Forest
Definition: An ensemble technique using multiple decision trees. Effective for both classification and regression tasks.
Understanding Through Analogy
- A student seeks advice from multiple people before making a decision. The final decision is based on the majority recommendation.
Ensemble Methods