Analysis Tasks

KCPilot’s analysis tasks are AI-powered diagnostic checks that examine your Kafka cluster’s configuration, performance, and health. Each task is defined as a YAML file containing prompts and rules that guide the AI analysis engine to identify specific issues and provide remediation guidance.

Overview

Analysis tasks leverage Large Language Models (LLMs) to intelligently analyze collected cluster data. Unlike static rule-based checks, these tasks can understand context, identify patterns, and provide nuanced recommendations based on your specific cluster configuration.

Key Features

AI-Powered Analysis: Uses OpenAI or compatible LLMs to analyze complex cluster states
Configurable Severity: Automatically maps findings to critical, warning, or info levels
Data Filtering: Each task specifies which data types it needs (logs, configs, metrics, admin)
Actionable Remediation: Provides specific steps to resolve identified issues

How to Use Analysis Tasks

List Available Tasks

To see all available analysis tasks with their descriptions:

# List all tasks
kcpilot task list

# List with detailed information
kcpilot task list --detailed

Execute a Single Task

To run a specific analysis task on collected scan data:

# Test a single task
kcpilot task test <task-id> <snapshot-path>

# Example: Test JVM heap configuration
kcpilot task test jvm_heap_memory_ratio ./scan-2024-01-15

# With debug logging
RUST_LOG=kcpilot=debug kcpilot task test <task-id> <snapshot-path>

Run Full Analysis

To execute all analysis tasks on a snapshot:

# Analyze with terminal and markdown reports
kcpilot analyze <snapshot-path> --report terminal,markdown

# Example
kcpilot analyze ./scan-2024-01-15 --report terminal,markdown --output analysis-report.md

Creating Custom Tasks

Analysis tasks are YAML files stored in the analysis_tasks/ directory. Each task includes:

Metadata: ID, name, description, and category
Prompt: The analysis instruction sent to the LLM
Data Selection: Which data types to include via include_data
Severity Mapping: Keywords that determine finding severity levels

Task Template

id: your_task_id
name: Task Display Name
description: Brief description of what this task checks
category: configuration|performance|security

prompt: |
  Your analysis prompt here with placeholders:
  Configuration: {config}
  Logs: {logs}
  Metrics: {metrics}
  
  Analysis instructions...

include_data:
  - config
  - logs
  - metrics
  - admin

severity_keywords:
  critical:
    - "data loss"
    - "cluster down"
  warning:
    - "performance degraded"
    - "misconfiguration"
  info:
    - "recommendation"
    - "optimization"

Available Analysis Tasks

Below is a comprehensive list of all available analysis tasks, organized by category:

Configuration - General

authentication_authorization - Verifies authentication and authorization configuration
broker_count_ha - Analyzes broker count for high availability
in_transit_encryption - Checks for in-transit encryption configuration
isr_replication_margin - Validates ISR replication settings
jvm_heap_memory_ratio - Analyzes JVM heap memory ratio configuration
jvm_heap_preallocation - Checks JVM heap preallocation settings
jvm_heap_size_limit - Validates JVM heap size limits
minimum_cpu_cores - Verifies minimum CPU core requirements
multiple_log_dirs - Checks for multiple log directory configuration
rack_awareness - Validates rack awareness configuration
recent_log_errors - Analyzes recent log errors and warnings
separate_listeners - Checks for separate listener configuration
thread_configuration - Analyzes thread pool configuration

Configuration - KRaft

kraft_controller_ha_check - Validates KRaft controller high availability

Configuration - ZooKeeper

zookeeper_ha_check - Checks ZooKeeper ensemble high availability
zookeeper_heap_memory - Analyzes ZooKeeper heap memory configuration
zookeeper_heap_preallocation - Validates ZooKeeper heap preallocation

Environment Configuration

To use AI-powered analysis tasks, you need to configure your LLM API key:

# OpenAI API
export OPENAI_API_KEY=your_openai_api_key_here

# Alternative LLM API
export LLM_API_KEY=your_alternative_llm_api_key

# Enable debug logging
export LLM_DEBUG=true

Troubleshooting

Common Issues

No LLM API Key: Tasks will fail without a configured API key
Timeout Issues: Increase timeout with --llm-timeout <seconds>
Data Not Found: Ensure snapshot contains required data types
Task Not Found: Verify task ID matches file name in analysis_tasks/

Debug Mode

Enable detailed logging to troubleshoot task execution:

# Debug specific task
RUST_LOG=kcpilot=debug kcpilot task test <task-id> <snapshot>

# Debug LLM interactions
kcpilot analyze <snapshot> --llmdbg