What if you could train, evaluate, and select the best machine learning model for your business problem — without writing a single line of ML code? Azure AutoML makes this a reality. In this post, we walk through a real fraud detection use case end-to-end, from data registration to model evaluation, using both the Azure Studio UI and the Python SDK.
1. What is Automated Machine Learning?
Machine learning model development is traditionally a resource-intensive process. Data scientists spend weeks selecting algorithms, engineering features, tuning hyperparameters, and validating results. For businesses without large ML teams, this creates a significant barrier to entry.
Automated Machine Learning (AutoML) solves this by systematically iterating through combinations of algorithms, preprocessing steps, and hyperparameters — selecting the best-performing pipeline based on a metric you define. Azure AutoML, part of Microsoft’s Azure Machine Learning platform, industrialises this process at scale.
Under the hood, Azure AutoML handles three major phases automatically:
- Featurization — imputing missing values, encoding categoricals, normalising numerics, detecting data types
- Algorithm sweeping — trying LightGBM, XGBoost, RandomForest, ExtremeRandomTrees, and more
- Ensembling — optionally combining top models into a voting or stacking ensemble for higher accuracy
2. When Should Businesses Use AutoML?
AutoML is exceptionally well-suited to a broad range of business problems. Here are the most impactful use cases:
Fraud & Risk Detection
Identify anomalous transactions or account activity. AutoML handles class imbalance and selects metrics like AUC that matter for rare-event detection.
Customer Churn Prediction
Predict which customers are likely to leave before they do. AutoML rapidly prototypes classification models from CRM data without custom ML engineering.
Demand Forecasting
Forecast inventory, staffing, or energy demand. Azure AutoML supports time-series forecasting with automatic lag detection and seasonality handling.
Healthcare Risk Stratification
Classify patient risk levels for readmission or disease progression — accelerating model development while maintaining clinical interpretability.
Price Optimisation
Regression AutoML models predict optimal price points based on competitor data, seasonality, and demand signals.
Predictive Maintenance
Classify equipment failure likelihood from sensor data. AutoML evaluates tree-based and boosting models best suited for structured IoT data.
3. Business Advantages of Azure AutoML
| Advantage | What it means for your business |
|---|---|
| Speed to insight | Go from raw data to a ranked set of trained models in hours, not weeks |
| No ML expertise required | Business analysts and engineers can run experiments without data science backgrounds |
| Cost efficiency | Pay only for compute used during training; no full-time ML team overhead |
| Reproducibility | Every trial is logged, versioned, and auditable in Azure ML Studio |
| Responsible AI built-in | Feature importance, model explainability, and fairness metrics generated automatically |
| Enterprise-grade MLOps | Best model can be registered, deployed, and monitored directly from the same platform |
4. The Use Case: Bank Account Fraud Detection
For this walkthrough, we use the Bank Account Fraud Dataset (NeurIPS 2022) — a large-scale synthetic dataset with approximately one million transactions and 30 features including payment type, employment status, housing status, device OS, and more.
The target variable is fraud_bool — a binary label indicating whether a bank account application is fraudulent. We use AUC weighted as our primary metric, which accounts for class imbalance and measures the model’s ability to rank fraudulent cases above legitimate ones.
5. Registering Data as an MLTable Asset
Azure AutoML requires data registered as an MLTable asset — not a raw file. An MLTable is a versioned, schema-aware data asset that tells Azure how to read, parse, and validate your data before training begins.
Step 1: Re-partition the Parquet file
Azure AutoML enforces a 20 MB row group size limit on Parquet files. Large datasets must be re-partitioned first:
import pyarrow.parquet as pq
import os
table = pq.read_table("path/to/your/data.parquet")
os.makedirs("./train_mltable", exist_ok=True)
pq.write_table(
table,
"./train_mltable/train_data.parquet",
row_group_size=50000 # well under the 20 MB limit
)
print(f"Rows: {len(table)}")
print(f"Size: {os.path.getsize('./train_mltable/train_data.parquet') / 1e6:.1f} MB")Step 2: Create the MLTable configuration file
# ./train_mltable/MLTable
paths:
- file: ./train_data.parquet
transformations:
- read_parquet:
include_path_column: falseStep 3: Register as a versioned data asset
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
from azure.ai.ml.entities import Data
from azure.ai.ml.constants import AssetTypes
credential = DefaultAzureCredential()
ml_client = MLClient(
credential,
subscription_id="<your-subscription-id>",
resource_group_name="<your-resource-group>",
workspace_name="<your-workspace>"
)
ml_client.data.create_or_update(
Data(
path="./train_mltable",
type=AssetTypes.MLTABLE,
name="fraud_train_mltable",
version="1",
description="Fraud detection training data"
)
)
print("MLTable registered successfully")📸 The registered fraud_train_mltable data asset in Azure ML Studio — Version 1, Type: Table.
6. Launching AutoML from Azure Studio (UI Walkthrough)
Azure Machine Learning Studio provides a guided, no-code interface for submitting AutoML jobs. Navigate to Jobs → + New job → Train automatically.
Step 1 — Training Method
Azure offers three training methods. Select Train automatically to launch the AutoML wizard — this submits a fully managed AutoML job without writing a single line of code.
📸 Step 1: Select ‘Train automatically’ to launch the AutoML wizard.
Step 2 — Basic Settings
Assign a meaningful job name and create a new experiment to group related runs. Experiments act as logical containers — all trials from this job will appear grouped under this experiment name in Studio.
📸 Step 2: Job named ‘fraud-automl-classification-UI’ under experiment ‘fraud-detection-automl’.
Step 3 — Task Type & Data
Select Classification as the task type, then choose your registered MLTable asset. Only MLTable-type assets appear — raw URI_FILE assets are filtered out as unsupported. This is why correct data registration matters.
📸 Step 3: Classification task selected with fraud_train_mltable showing as a supported Table asset.
Step 4 — Task Settings
Set the target column to fraud_bool. Key configuration options include:
- Primary metric: AUC weighted — optimal for imbalanced fraud data
- Validation: 5-fold cross validation — each model scored across 5 non-overlapping data splits
- Featurization: Auto — Azure handles encoding, imputation, and scaling automatically
📸 Step 4: Target column set to fraud_bool, with 5-fold cross validation and AUC weighted as primary metric.
Step 5 — Compute
Select your Azure ML compute cluster. We use compute1 — a Standard_DS3_v2 with 4 vCPUs, 14 GB RAM at $0.27/hr.
📸 Step 5: compute1 cluster selected — Standard_DS3_v2, 4 vCPUs, 14GB RAM, $0.27/hr.
Step 6 — Review & Submit
The review screen summarises all configuration before submission. Verify task type, data asset, target column, validation strategy, and compute before clicking Submit training job.
📸 Step 6: Final review showing all settings — Classification, fraud_train_mltable, fraud_bool target, 5-fold CV, compute1.
7. The Job in Action
Once submitted, the job overview page shows real-time status. Azure begins by provisioning compute, then starts iterating through algorithm and preprocessing combinations. The Tags panel live-updates with each completed trial’s algorithm, score, and preprocessor.
📸 Live job overview showing Status: Running, Primary metric: AUC weighted, Featurization: Auto, with real-time trial tags.
8. Submitting an AutoML Job via Python SDK
For teams that prefer code-first workflows or need to integrate AutoML into CI/CD pipelines, the Azure ML Python SDK offers full programmatic control:
from azure.ai.ml import MLClient, Input
from azure.ai.ml.automl import classification
from azure.ai.ml.constants import AssetTypes
from azure.identity import DefaultAzureCredential
# 1. Connect to workspace
credential = DefaultAzureCredential()
ml_client = MLClient(
credential,
subscription_id="<your-subscription-id>",
resource_group_name="<your-resource-group>",
workspace_name="<your-workspace>"
)
# 2. Reference your registered MLTable
training_data = Input(
type=AssetTypes.MLTABLE,
path="azureml:fraud_train_mltable:1"
)
# 3. Configure the AutoML classification job
automl_job = classification(
compute="compute1",
experiment_name="fraud-detection-automl",
training_data=training_data,
target_column_name="fraud_bool",
primary_metric="AUC_weighted",
n_cross_validations=5,
enable_model_explainability=True,
)
# 4. Set limits to control cost and runtime
automl_job.set_limits(
timeout_minutes=120,
trial_timeout_minutes=15,
max_trials=20,
max_concurrent_trials=1,
enable_early_termination=True
)
# 5. Submit
returned_job = ml_client.jobs.create_or_update(automl_job)
print(f"Job submitted: {returned_job.name}")
print(f"Studio URL: {returned_job.studio_url}")Polling Job Status
import time
job_name = returned_job.name
while True:
job = ml_client.jobs.get(job_name)
print(f"Status: {job.status}")
if job.status in ["Completed", "Failed", "Canceled"]:
break
time.sleep(30)Retrieving & Registering the Best Model
from azure.ai.ml.entities import Model
from azure.ai.ml.constants import AssetTypes
# Get best child run
best_child = ml_client.jobs.get(f"{job_name}_best")
# Register model to Azure ML registry
registered_model = ml_client.models.create_or_update(
Model(
path=f"azureml://jobs/{best_child.name}/outputs/best_model",
name="fraud-detection-model",
description="Best AutoML model — fraud detection experiment",
type=AssetTypes.MLFLOW_MODEL
)
)
print(f"Registered: {registered_model.name} v{registered_model.version}")9. Interpreting the Results
Once trials complete, the Models + child jobs tab presents a ranked leaderboard of all trained models. Each row shows algorithm, preprocessing scaler, AUC score, training duration, and key hyperparameters.
Models + child jobs tab — 6 trained models ranked by AUC weighted, with algorithm names, durations, and hyperparameters.
In our experiment, the top results were:

Best Model Summary
Clicking the top model reveals a full summary — algorithm pipeline, AUC score, sampling percentage, and deployment status. From here you can deploy, download, or explain the model in one click.
Best model detail — MaxAbsScaler + LightGBM, AUC weighted: 0.89301, 100% sampling, ready to deploy or register.
Metrics Dashboard
The metrics tab provides over 15 performance indicators alongside four diagnostic charts: ROC curve, Precision-Recall curve, Calibration curve, Confusion Matrix, Cumulative Gains, and Lift curve.
Metrics dashboard for best model — ROC, Precision-Recall, Calibration, Lift curves, Confusion Matrix, and 15+ metric tiles.
| Metric | Value | What it means |
|---|---|---|
| Accuracy | 0.9888 | Overall correct predictions — misleading for imbalanced data |
| AUC weighted | 0.8930 | Primary metric — model’s ability to rank fraud above non-fraud |
| AUC micro | 0.9975 | Micro-averaged across classes |
| Precision (weighted) | 0.9832 | Of predicted fraud cases, how many are real fraud |
| Recall (macro) | 0.5235 | Of actual fraud cases, how many did the model catch |
| Log loss | 0.0458 | Prediction confidence — lower is better |
Outputs & Artifacts
Azure AutoML automatically saves a rich set of artifacts for the best model under the Outputs + logs tab — everything needed to reproduce, deploy, or extend the model.
Outputs + logs — model.pkl, mlflow-model/, scoring scripts, featurization_summary.json, pipeline_graph.json, and more.
| Artifact | Purpose |
|---|---|
| model.pkl | Serialised trained model — ready for inference |
| mlflow-model/ | MLflow-packaged model for standardised deployment |
| scoring_file_v1_0_0.py | Auto-generated inference script |
| featurization_summary.json | Transformations applied to each feature |
| conda_env_v1_0_0.yml | Exact environment spec for reproducible inference |
| pipeline_graph.json | Full preprocessing + model pipeline definition |
10. Conclusion
Azure AutoML eliminates the most time-consuming parts of the ML lifecycle — algorithm selection, hyperparameter tuning, and cross-validation — without sacrificing transparency or control. In our fraud detection experiment, six production-quality models were trained, evaluated, and ranked in under 20 minutes of wall-clock time, with the best model achieving an AUC of 0.893 on a highly imbalanced dataset.
For businesses looking to operationalise machine learning without building large data science teams, Azure AutoML offers a compelling combination of speed, governance, and enterprise integration. The full audit trail, versioned artifacts, and built-in explainability make it suitable for regulated industries including banking, insurance, and healthcare.