The hypothesis is that the detected network traffic represents a scraper botnet attempting to exfiltrate data by establishing persistent, low-and-slow connections to command-and-control servers. SOC teams should proactively hunt for this behavior in Azure Sentinel to identify and mitigate potential data exfiltration and botnet command-and-control activities early.
Detection Rule
title: Potential Hello-World Scraper Botnet Activity
id: 1712bafe-be05-4a0e-89d4-17a3ed151bf5
status: experimental
description: |
Detects network traffic potentially associated with a scraper botnet variant that uses the "Hello-World/1.0" user-agent string.
references:
- https://www.greynoise.io/blog/new-scraper-botnet-concentrated-in-taiwan
- https://viz.greynoise.io/tags/hello-world-scraper-botnet?days=30
author: Joseph A. M.
date: 2025-08-02
tags:
- attack.reconnaissance
- attack.t1595
logsource:
category: proxy
detection:
selection:
c-useragent: 'Hello-World/1.0'
cs-method: 'GET'
condition: selection
falsepositives:
- Legitimate network monitoring or vulnerability scanning tools that may use this generic user agent.
- Internal development or testing scripts. Consider filtering by source IP if this is expected from certain systems.
level: medium
imWebSession
| where HttpUserAgent =~ "Hello-World/1.0" and HttpRequestMethod =~ "GET"
Scenario: Developer Testing a Web Scraper Tool
Description: A developer is testing a legitimate web scraping tool (e.g., Scrapy, BeautifulSoup, or Selenium) to extract data from a public website for a project.
Filter/Exclusion: Exclude traffic originating from known development environments (e.g., IP addresses in the 10.0.0.0/8 range) or filter by user-agent strings associated with development tools (e.g., Mozilla/5.0 (X11; Linux x86_64; rv:123.0) Gecko/20100101 Firefox/123.0).
Scenario: Scheduled Job for Data Aggregation
Description: A system administrator has configured a scheduled job (e.g., using cron or Task Scheduler) to run a script that scrapes public data from a company’s internal portal for reporting purposes.
Filter/Exclusion: Exclude traffic from known internal job scheduling systems (e.g., cron, systemd, or Windows Task Scheduler) by checking the source IP or process name.
Scenario: Admin Performing Manual Data Extraction
Description: An admin is manually extracting data from a public API (e.g., using curl or Postman) to populate a local database for a reporting tool.
Filter/Exclusion: Exclude traffic from admin workstations (e.g., IP addresses in the 192.168.1.0/24 range) or filter by specific command-line tools (e.g., curl, wget, or httpie).
Scenario: CI/CD Pipeline Fetching Public Dependencies
Description: A CI/CD pipeline (e.g., Jenkins, GitHub Actions) is fetching public dependencies or metadata from external