What if you could train, evaluate, and select the best machine learning model for your business problem — without writing a single line of ML code? Azure AutoML makes this a reality. In this post, we walk through a real fraud detection use case end-to-end, from data registration to model evaluation, using both the Azure Studio UI and the Python SDK.

1. What is Automated Machine Learning?

Machine learning model development is traditionally a resource-intensive process. Data scientists spend weeks selecting algorithms, engineering features, tuning hyperparameters, and validating results. For businesses without large ML teams, this creates a significant barrier to entry.

Automated Machine Learning (AutoML) solves this by systematically iterating through combinations of algorithms, preprocessing steps, and hyperparameters — selecting the best-performing pipeline based on a metric you define. Azure AutoML, part of Microsoft’s Azure Machine Learning platform, industrialises this process at scale.

Under the hood, Azure AutoML handles three major phases automatically:

  • Featurization — imputing missing values, encoding categoricals, normalising numerics, detecting data types
  • Algorithm sweeping — trying LightGBM, XGBoost, RandomForest, ExtremeRandomTrees, and more
  • Ensembling — optionally combining top models into a voting or stacking ensemble for higher accuracy
Key insight: Azure AutoML doesn’t just pick one model — it runs a full experiment of trials, each with a different algorithm and preprocessing combination, scored via cross-validation. You get a ranked leaderboard of all trials, not just the winner.

2. When Should Businesses Use AutoML?

AutoML is exceptionally well-suited to a broad range of business problems. Here are the most impactful use cases:

🏦

Fraud & Risk Detection

Identify anomalous transactions or account activity. AutoML handles class imbalance and selects metrics like AUC that matter for rare-event detection.

📉

Customer Churn Prediction

Predict which customers are likely to leave before they do. AutoML rapidly prototypes classification models from CRM data without custom ML engineering.

📦

Demand Forecasting

Forecast inventory, staffing, or energy demand. Azure AutoML supports time-series forecasting with automatic lag detection and seasonality handling.

🏥

Healthcare Risk Stratification

Classify patient risk levels for readmission or disease progression — accelerating model development while maintaining clinical interpretability.

🛒

Price Optimisation

Regression AutoML models predict optimal price points based on competitor data, seasonality, and demand signals.

🏗️

Predictive Maintenance

Classify equipment failure likelihood from sensor data. AutoML evaluates tree-based and boosting models best suited for structured IoT data.

3. Business Advantages of Azure AutoML

AdvantageWhat it means for your business
Speed to insightGo from raw data to a ranked set of trained models in hours, not weeks
No ML expertise requiredBusiness analysts and engineers can run experiments without data science backgrounds
Cost efficiencyPay only for compute used during training; no full-time ML team overhead
ReproducibilityEvery trial is logged, versioned, and auditable in Azure ML Studio
Responsible AI built-inFeature importance, model explainability, and fairness metrics generated automatically
Enterprise-grade MLOpsBest model can be registered, deployed, and monitored directly from the same platform

4. The Use Case: Bank Account Fraud Detection

For this walkthrough, we use the Bank Account Fraud Dataset (NeurIPS 2022) — a large-scale synthetic dataset with approximately one million transactions and 30 features including payment type, employment status, housing status, device OS, and more.

The target variable is fraud_bool — a binary label indicating whether a bank account application is fraudulent. We use AUC weighted as our primary metric, which accounts for class imbalance and measures the model’s ability to rank fraudulent cases above legitimate ones.

5. Registering Data as an MLTable Asset

Azure AutoML requires data registered as an MLTable asset — not a raw file. An MLTable is a versioned, schema-aware data asset that tells Azure how to read, parse, and validate your data before training begins.

Step 1: Re-partition the Parquet file

Azure AutoML enforces a 20 MB row group size limit on Parquet files. Large datasets must be re-partitioned first:

import pyarrow.parquet as pq
import os

table = pq.read_table("path/to/your/data.parquet")

os.makedirs("./train_mltable", exist_ok=True)

pq.write_table(
    table,
    "./train_mltable/train_data.parquet",
    row_group_size=50000   # well under the 20 MB limit
)
print(f"Rows: {len(table)}")
print(f"Size: {os.path.getsize('./train_mltable/train_data.parquet') / 1e6:.1f} MB")

Step 2: Create the MLTable configuration file

# ./train_mltable/MLTable
paths:
  - file: ./train_data.parquet
transformations:
  - read_parquet:
      include_path_column: false

Step 3: Register as a versioned data asset

from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
from azure.ai.ml.entities import Data
from azure.ai.ml.constants import AssetTypes

credential = DefaultAzureCredential()
ml_client = MLClient(
    credential,
    subscription_id="<your-subscription-id>",
    resource_group_name="<your-resource-group>",
    workspace_name="<your-workspace>"
)

ml_client.data.create_or_update(
    Data(
        path="./train_mltable",
        type=AssetTypes.MLTABLE,
        name="fraud_train_mltable",
        version="1",
        description="Fraud detection training data"
    )
)
print("MLTable registered successfully")
Why MLTable? Unlike a raw URI file, MLTable includes schema metadata. Azure AutoML uses this to validate column types, detect the target variable, and apply featurization correctly — before a single model is trained.

MLTable registered in Azure ML Studio

📸 The registered fraud_train_mltable data asset in Azure ML Studio — Version 1, Type: Table.

6. Launching AutoML from Azure Studio (UI Walkthrough)

Azure Machine Learning Studio provides a guided, no-code interface for submitting AutoML jobs. Navigate to Jobs → + New job → Train automatically.

Step 1 — Training Method

Azure offers three training methods. Select Train automatically to launch the AutoML wizard — this submits a fully managed AutoML job without writing a single line of code.

Training method selection

📸 Step 1: Select ‘Train automatically’ to launch the AutoML wizard.

Step 2 — Basic Settings

Assign a meaningful job name and create a new experiment to group related runs. Experiments act as logical containers — all trials from this job will appear grouped under this experiment name in Studio.

Basic settings filled in

📸 Step 2: Job named ‘fraud-automl-classification-UI’ under experiment ‘fraud-detection-automl’.

Step 3 — Task Type & Data

Select Classification as the task type, then choose your registered MLTable asset. Only MLTable-type assets appear — raw URI_FILE assets are filtered out as unsupported. This is why correct data registration matters.

Task type and data selection

📸 Step 3: Classification task selected with fraud_train_mltable showing as a supported Table asset.

Step 4 — Task Settings

Set the target column to fraud_bool. Key configuration options include:

  • Primary metric: AUC weighted — optimal for imbalanced fraud data
  • Validation: 5-fold cross validation — each model scored across 5 non-overlapping data splits
  • Featurization: Auto — Azure handles encoding, imputation, and scaling automatically

Task settings configuration

📸 Step 4: Target column set to fraud_bool, with 5-fold cross validation and AUC weighted as primary metric.

Step 5 — Compute

Select your Azure ML compute cluster. We use compute1 — a Standard_DS3_v2 with 4 vCPUs, 14 GB RAM at $0.27/hr.

Compute cluster selection

📸 Step 5: compute1 cluster selected — Standard_DS3_v2, 4 vCPUs, 14GB RAM, $0.27/hr.

Step 6 — Review & Submit

The review screen summarises all configuration before submission. Verify task type, data asset, target column, validation strategy, and compute before clicking Submit training job.

Review screen before submission

📸 Step 6: Final review showing all settings — Classification, fraud_train_mltable, fraud_bool target, 5-fold CV, compute1.

7. The Job in Action

Once submitted, the job overview page shows real-time status. Azure begins by provisioning compute, then starts iterating through algorithm and preprocessing combinations. The Tags panel live-updates with each completed trial’s algorithm, score, and preprocessor.

AutoML job running in Azure Studio

📸 Live job overview showing Status: Running, Primary metric: AUC weighted, Featurization: Auto, with real-time trial tags.

8. Submitting an AutoML Job via Python SDK

For teams that prefer code-first workflows or need to integrate AutoML into CI/CD pipelines, the Azure ML Python SDK offers full programmatic control:

from azure.ai.ml import MLClient, Input
from azure.ai.ml.automl import classification
from azure.ai.ml.constants import AssetTypes
from azure.identity import DefaultAzureCredential

# 1. Connect to workspace
credential = DefaultAzureCredential()
ml_client = MLClient(
    credential,
    subscription_id="<your-subscription-id>",
    resource_group_name="<your-resource-group>",
    workspace_name="<your-workspace>"
)

# 2. Reference your registered MLTable
training_data = Input(
    type=AssetTypes.MLTABLE,
    path="azureml:fraud_train_mltable:1"
)

# 3. Configure the AutoML classification job
automl_job = classification(
    compute="compute1",
    experiment_name="fraud-detection-automl",
    training_data=training_data,
    target_column_name="fraud_bool",
    primary_metric="AUC_weighted",
    n_cross_validations=5,
    enable_model_explainability=True,
)

# 4. Set limits to control cost and runtime
automl_job.set_limits(
    timeout_minutes=120,
    trial_timeout_minutes=15,
    max_trials=20,
    max_concurrent_trials=1,
    enable_early_termination=True
)

# 5. Submit
returned_job = ml_client.jobs.create_or_update(automl_job)
print(f"Job submitted: {returned_job.name}")
print(f"Studio URL: {returned_job.studio_url}")

Polling Job Status

import time

job_name = returned_job.name
while True:
    job = ml_client.jobs.get(job_name)
    print(f"Status: {job.status}")
    if job.status in ["Completed", "Failed", "Canceled"]:
        break
    time.sleep(30)

Retrieving & Registering the Best Model

from azure.ai.ml.entities import Model
from azure.ai.ml.constants import AssetTypes

# Get best child run
best_child = ml_client.jobs.get(f"{job_name}_best")

# Register model to Azure ML registry
registered_model = ml_client.models.create_or_update(
    Model(
        path=f"azureml://jobs/{best_child.name}/outputs/best_model",
        name="fraud-detection-model",
        description="Best AutoML model — fraud detection experiment",
        type=AssetTypes.MLFLOW_MODEL
    )
)
print(f"Registered: {registered_model.name} v{registered_model.version}")

9. Interpreting the Results

Once trials complete, the Models + child jobs tab presents a ranked leaderboard of all trained models. Each row shows algorithm, preprocessing scaler, AUC score, training duration, and key hyperparameters.

Models leaderboard

Models + child jobs tab — 6 trained models ranked by AUC weighted, with algorithm names, durations, and hyperparameters.

In our experiment, the top results were:

Best Model Summary

Clicking the top model reveals a full summary — algorithm pipeline, AUC score, sampling percentage, and deployment status. From here you can deploy, download, or explain the model in one click.

Best model summary

Best model detail — MaxAbsScaler + LightGBM, AUC weighted: 0.89301, 100% sampling, ready to deploy or register.

Metrics Dashboard

The metrics tab provides over 15 performance indicators alongside four diagnostic charts: ROC curve, Precision-Recall curve, Calibration curve, Confusion Matrix, Cumulative Gains, and Lift curve.

Full metrics dashboard

Metrics dashboard for best model — ROC, Precision-Recall, Calibration, Lift curves, Confusion Matrix, and 15+ metric tiles.

MetricValueWhat it means
Accuracy0.9888Overall correct predictions — misleading for imbalanced data
AUC weighted0.8930Primary metric — model’s ability to rank fraud above non-fraud
AUC micro0.9975Micro-averaged across classes
Precision (weighted)0.9832Of predicted fraud cases, how many are real fraud
Recall (macro)0.5235Of actual fraud cases, how many did the model catch
Log loss0.0458Prediction confidence — lower is better

Outputs & Artifacts

Azure AutoML automatically saves a rich set of artifacts for the best model under the Outputs + logs tab — everything needed to reproduce, deploy, or extend the model.

Outputs and logs tab

Outputs + logs — model.pkl, mlflow-model/, scoring scripts, featurization_summary.json, pipeline_graph.json, and more.

ArtifactPurpose
model.pklSerialised trained model — ready for inference
mlflow-model/MLflow-packaged model for standardised deployment
scoring_file_v1_0_0.pyAuto-generated inference script
featurization_summary.jsonTransformations applied to each feature
conda_env_v1_0_0.ymlExact environment spec for reproducible inference
pipeline_graph.jsonFull preprocessing + model pipeline definition
Pro tip: Click View generated code on any child run to see the full Python code that reproduces the exact pipeline AutoML selected — bridging no-code AutoML and custom ML engineering.

10. Conclusion

Azure AutoML eliminates the most time-consuming parts of the ML lifecycle — algorithm selection, hyperparameter tuning, and cross-validation — without sacrificing transparency or control. In our fraud detection experiment, six production-quality models were trained, evaluated, and ranked in under 20 minutes of wall-clock time, with the best model achieving an AUC of 0.893 on a highly imbalanced dataset.

For businesses looking to operationalise machine learning without building large data science teams, Azure AutoML offers a compelling combination of speed, governance, and enterprise integration. The full audit trail, versioned artifacts, and built-in explainability make it suitable for regulated industries including banking, insurance, and healthcare.

Leave a Reply

Your email address will not be published. Required fields are marked *