Learn Machine Learning concepts visually - no coding required!
Upload your data, explore patterns, and build predictive models with just a few clicks.
Select the type of analysis you want to perform. Each method has different data requirements.
K-Means Clustering
When to use: When you want to discover natural groups in your data without knowing them beforehand.
Question answered: "Are there distinct groups of customers with similar characteristics?"
Business examples:
Probability & Odds Analysis
When to use: When you want to know HOW MUCH each factor influences an outcome and calculate probabilities.
Question answered: "What is the probability of X happening, and which factors matter most?"
Business examples:
Predict Numeric Values
When to use: When you want to predict a continuous number and understand which factors drive it.
Question answered: "What value should we expect, and which factors matter most?"
Business examples:
Predict Categories with Rules
When to use: When you need clear, explainable rules to classify items into categories.
Question answered: "What rules determine which category something belongs to?"
Business examples:
Predict Numbers with Rules
When to use: When you need explainable rules to predict numeric values and see feature thresholds.
Question answered: "What value should we predict and what rules lead to it?"
Business examples:
or click to browse
CSV files up to 5MB, max 10,000 rows and 50 columns| Column Name | Type | Missing | Unique Values | Statistics |
|---|
Select how to handle missing values for each column. Remove rows deletes any row where this column is blank. Fill with median (numeric columns only) replaces blanks with the middle value.
| Column | Type | Missing | Action |
|---|
Detect and optionally remove statistical outliers from numeric columns.
K-Means clustering finds natural groupings in your data by identifying customers who are similar to each other. Think of it like sorting students into study groups based on their learning styles and interests.
Your goal: Select features that describe your customers (like age, income, spending), choose a number of clusters, and let the algorithm find meaningful segments. Then interpret what makes each segment unique.
Note: Features are automatically standardized (scaled to have mean=0, std=1) before clustering. This ensures features with larger ranges (like income) don't dominate over smaller-ranged features (like age).
Select features and click "Run Clustering" to visualize customer segments
Each plot shows how clusters are distributed across two features. Look for clear separation between colors to identify well-defined segments.
A decision tree creates a series of yes/no questions to classify data - like a flowchart. Each branch represents a decision rule (e.g., "Is income > $50,000?"), and each leaf is a final prediction.
Your goal: Select what you want to predict (target) and which features to use. The tree will show you exactly which factors matter most and the rules it uses to make predictions. Great for explaining decisions to stakeholders!
Select a target, features, and click "Train Decision Tree" to visualize the If/Then logic
Why encoding? Machine learning models work with numbers. Text categories like "Yes/No" or "Male/Female" are converted to numbers (0, 1, 2, etc.) so the algorithm can process them. Use this table to interpret the tree diagram and rules above.
Logistic regression predicts the probability of an outcome and tells you exactly how much each factor influences that probability. Unlike decision trees, it gives you coefficients that quantify each factor's impact.
Your goal: Select a binary outcome to predict (Yes/No, Buy/Don't Buy) and the features that might influence it. The results will show you odds ratios (e.g., "each $10K income increase doubles the odds of purchase") and let you adjust the decision threshold to balance false positives vs. false negatives.
Configure settings and click "Train Logistic Regression" to see feature coefficients
Why encoding? Machine learning models work with numbers. Text categories like "Yes/No" or "Male/Female" are converted to numbers (0, 1, 2, etc.) so the algorithm can process them. Use this table to interpret the coefficients and confusion matrix.
Linear regression predicts a continuous numeric value (like sales, price, or score) based on input features. It finds the best-fit line through your data, showing you exactly how much each factor contributes to the outcome.
Your goal: Select a numeric outcome to predict and the features that might influence it. The results will show you R² (how well the model fits), p-values (which factors are statistically significant), and residuals (prediction errors).
Configure settings and click "Train Regression Model" to see feature coefficients
| Feature | Coefficient | Std Error | t-Statistic | P-Value | Significance |
|---|
A regression tree predicts numeric values by creating if/then rules, just like a classification tree. Instead of predicting categories, each leaf node predicts a number (the average of training samples in that group).
Your goal: Select a numeric variable to predict and features that might influence it. The tree will show you decision rules that lead to different predicted values - great for understanding what drives higher or lower outcomes!
Select a numeric target, features, and click "Train Regression Tree" to visualize the prediction rules
Why encoding? Categorical features like "Yes/No" are converted to numbers so the algorithm can process them.
| Mean: | -- |
| Std Dev: | -- |
| Min: | -- |
| Max: | -- |
After reviewing your analysis results, write a summary of what the data is telling you. What patterns did you discover? What insights are most important for decision-making? This summary will be included in your PDF report.
In classification trees, entropy measures "messiness" or uncertainty - high entropy means mixed data. In regression trees, we use MSE (Mean Squared Error) instead, measuring how spread out values are. Both types of trees try to reduce these measures by splitting data into cleaner groups.
Gini measures how often you'd be wrong if you randomly guessed a label. Lower Gini = purer groups = better! It's like asking "if I picked a random customer from this group, how likely am I to misclassify them?"
Overfitting is like memorizing answers instead of learning concepts. The model becomes too specific to your training data and fails on new data. It's like a student who memorizes test answers but can't apply knowledge to new problems.
A centroid is the "center point" of a cluster - imagine it as the average customer in each group. K-Means positions centroids to minimize the distance between each point and its nearest centroid.
Clusters are groups of similar data points. Think of it as automatically sorting customers into groups based on their behavior - like "budget shoppers", "premium buyers", and "occasional visitors".
Features are the characteristics or attributes you use to make predictions. For customers, features might include age, income, purchase frequency, etc. They're the "inputs" to your model.
Machine learning algorithms work with numbers, not text. Encoding converts text categories (like "Male/Female" or "Yes/No") into numbers (0, 1, 2...). When you see a number in the results, check the encoding table to see what category it represents.
The target is what you're trying to predict - the "answer" you want the model to learn. For example, "Will this customer buy?" or "Is this email spam?"
Accuracy is simply "how often is the model correct?" If a model has 85% accuracy, it makes the right prediction 85 out of 100 times. Higher is better!
K is how many groups you want to create. Choosing K is part art, part science. Too few clusters = oversimplified groups. Too many = overly specific groups that don't generalize well.
Max depth limits how "tall" your decision tree can grow. A deeper tree can learn more complex patterns but risks overfitting. A shallower tree is simpler and more generalizable.
The silhouette score measures how similar points are to their own cluster vs other clusters. Ranges from -1 to 1: scores near 1 mean well-separated clusters, near 0 means overlapping clusters, negative means points might be in the wrong cluster.
Inertia measures how spread out points are within each cluster. Lower inertia = tighter clusters. The "elbow method" plots inertia vs K to find where adding more clusters stops helping much.
This index measures cluster separation - specifically, how distinct clusters are from each other. Lower values indicate better-defined, more separated clusters. Aim for values closer to 0.
We divide data into two parts: "training data" to teach the model, and "test data" to evaluate it. This simulates how the model will perform on new, unseen data. Typical splits are 80/20 or 70/30.
Stratified sampling ensures the train and test sets have the same proportion of each class as the original data. This is important when classes are imbalanced (e.g., 90% buyers, 10% non-buyers).
A random seed makes results reproducible. Using the same seed will always produce the same random split, so you can compare different model settings fairly. Change the seed to see how stable your results are.
Despite its name, logistic regression is used for classification (not regression). It predicts the probability of an outcome (like "Will Buy: Yes/No") based on input features. It's great for understanding which factors influence a decision.
The odds ratio tells you how much the odds of the outcome change when a feature increases by 1 unit. An odds ratio of 2 means the outcome is twice as likely. Less than 1 means less likely. Equal to 1 means no effect.
In logistic regression, coefficients show the direction and strength of each feature's influence. Positive = increases likelihood of the outcome. Negative = decreases likelihood. Larger absolute value = stronger effect.
The ROC (Receiver Operating Characteristic) curve shows how well the model distinguishes between classes at different threshold settings. The area under this curve (AUC) measures overall performance: 1.0 is perfect, 0.5 is random guessing.
Precision answers "Of all positive predictions, how many were correct?" Recall answers "Of all actual positives, how many did we find?" High precision = few false alarms. High recall = few missed cases.
Linear regression predicts a continuous numeric outcome (like sales or price) based on input features. It finds the best-fit line through your data points. Use it when you want to predict "how much" rather than "which category."
Multiple regression extends linear regression to use multiple input features. It helps you understand how several factors together influence an outcome, and which factors have the strongest impact.
R² measures how well the model explains the variation in your data. It ranges from 0 to 1. R²=0.80 means 80% of the variation is explained by the model. Higher is better, but very high values (>0.95) might indicate overfitting.
Adjusted R² accounts for the number of features in the model. Unlike regular R², it penalizes adding features that don't improve predictions. Use it to compare models with different numbers of features.
The p-value tests if a feature's effect is statistically significant. P < 0.05 is commonly considered significant, meaning there's less than 5% chance the effect is due to random chance. Lower p-values indicate stronger evidence.
Residuals are the differences between actual and predicted values. Good models have residuals that are randomly scattered around zero. Patterns in residuals may indicate the model is missing something important.
RMSE measures the average prediction error in the same units as your target variable. If predicting sales in dollars, RMSE of 100 means predictions are typically off by about $100. Lower is better.
MAE is the average absolute difference between predictions and actual values. Unlike RMSE, it doesn't penalize large errors as heavily. It's easier to interpret: MAE of 50 means predictions are off by 50 on average.
A regression tree predicts numeric values using if/then rules, just like a classification tree but for continuous outcomes. Each leaf node contains the average value of training samples that reached that node. Great for understanding what drives higher or lower values.