Users may be exfiltrating data by returning amounts significantly higher than their daily average, indicating potential data theft or reconnaissance activities. SOC teams should proactively hunt for this behavior in Azure Sentinel to identify and mitigate potential data exfiltration attempts early.
KQL Query
let starttime = todatetime('{{StartTimeISO}}');
let endtime = todatetime('{{EndTimeISO}}');
let lookback = totimespan((endtime-starttime)*7);
let threshold = 10;
let baseline = 10000;
let diff = 5;
let anomolous_users = (
LAQueryLogs
| where TimeGenerated between(startofday(ago(lookback))..starttime)
| where isnotempty(ResponseRowCount)
| summarize score=sum(ResponseRowCount) by AADEmail
| join kind = fullouter (LAQueryLogs
| where TimeGenerated between(starttime..endtime)
| summarize score_now=sum(ResponseRowCount) by AADEmail)
on AADEmail
| extend hist_score = iif((score/29)*threshold > baseline, (score/29)*threshold, baseline)
| where isnotempty(score)
| where score_now > hist_score*diff
| project AADEmail);
LAQueryLogs
| where TimeGenerated between(starttime..endtime)
| where AADEmail in(anomolous_users)
| extend timestamp = TimeGenerated, AccountCustomEntity = AADEmail
// Comment out the line below to see the queries run by users.
| summarize total_rows = sum(ResponseRowCount), NoQueries = count(), AvgQuerySize = sum(ResponseRowCount)/count() by AADEmail
id: 8699df3f-f89e-431f-9dea-056c4ce7014a
name: User returning more data than daily average
description: |
'This hunting query looks for users whose total returned data that is significantly above their average.'
requiredDataConnectors:
- connectorId: AzureMonitor(Query Audit)
dataTypes:
- LAQueryLogs
tactics:
- Exfiltration
relevantTechniques:
- T1030
query: |
let starttime = todatetime('{{StartTimeISO}}');
let endtime = todatetime('{{EndTimeISO}}');
let lookback = totimespan((endtime-starttime)*7);
let threshold = 10;
let baseline = 10000;
let diff = 5;
let anomolous_users = (
LAQueryLogs
| where TimeGenerated between(startofday(ago(lookback))..starttime)
| where isnotempty(ResponseRowCount)
| summarize score=sum(ResponseRowCount) by AADEmail
| join kind = fullouter (LAQueryLogs
| where TimeGenerated between(starttime..endtime)
| summarize score_now=sum(ResponseRowCount) by AADEmail)
on AADEmail
| extend hist_score = iif((score/29)*threshold > baseline, (score/29)*threshold, baseline)
| where isnotempty(score)
| where score_now > hist_score*diff
| project AADEmail);
LAQueryLogs
| where TimeGenerated between(starttime..endtime)
| where AADEmail in(anomolous_users)
| extend timestamp = TimeGenerated, AccountCustomEntity = AADEmail
// Comment out the line below to see the queries run by users.
| summarize total_rows = sum(ResponseRowCount), NoQueries = count(), AvgQuerySize = sum(ResponseRowCount)/count() by AADEmail
version: 1.0.0
metadata:
source:
kind: Community
author:
name: Pete Bryan
support:
tier: Microsoft
categories:
domains: [ "Security - Threat Protection" ]
Scenario: A system administrator is performing a routine log analysis using ELK Stack (Elasticsearch, Logstash, Kibana) to investigate a potential security incident.
Filter/Exclusion: Exclude users with the admin or sysadmin role, or add a condition to filter out queries executed via Kibana or Logstash.
Scenario: A scheduled job runs daily to archive old data using Apache NiFi or Airflow, which temporarily increases the amount of data returned by the user account running the job.
Filter/Exclusion: Exclude jobs associated with Airflow or NiFi by checking the job name or source, or add a time-based filter to ignore data returned during scheduled maintenance windows.
Scenario: A database backup process is initiated using MySQL or PostgreSQL tools like pg_dump or mysqldump, which returns a large amount of data as part of the backup operation.
Filter/Exclusion: Exclude queries that match known backup tools or include specific command-line arguments like --backup or --dump.
Scenario: A reporting tool like Tableau or Power BI is used by a business analyst to generate a large dataset for a report, which temporarily spikes the data returned by the user.
Filter/Exclusion: Exclude users associated with Tableau Server or Power BI Service by checking the user’s authentication source or add a filter for specific report generation times.
Scenario: A data migration or ETL process is executed using Apache Spark or Informatica, which involves large data transfers and temporarily increases the data returned by the user.
Filter/Exclusion: Exclude users associated with Spark or Informatica by checking the job ID or source, or add