Agent Workflow¶

The service is a agentic system composed of a multi-step workflow.

The three-stage document processing pipeline converts, analyzes, and redacts sensitive information from PDF contracts using AI agents powered by AWS Bedrock models.

Agentic Workflow

Stage 1: PDF to Markdown Conversion¶

Multimodal agent uses AI vision capabilities:

Reads and understands PDF content
Converts PDF structure and content to clean markdown format
Preserves document structure, formatting, and content hierarchy

Output: Markdown file representing the original PDF document

Stage 2: Sensitive Data Detection¶

Detection agent specialized for sensitive information identification

Analyzes document content using structured output with SensitiveData model
Identifies and extracts sensitive information including:
Personal information (names, emails, phone numbers)
Company details (names, addresses, registration numbers)
Document metadata and analysis information
Applies strict guidelines: only extract information actually present in text

Output: Structured JSON file with detected sensitive information

Stage 3: Document Redaction¶

Redaction agent focused on content sanitization

Systematically redacts all sensitive information identified in Stage 2
Preserves document structure and non-sensitive content
Maintains document readability while removing confidential data

Output: Redacted markdown file with sensitive information removed

Agent Architecture¶

Specialized Agents: Each stage uses a purpose-built agent with specific: - System prompts tailored to the task - Curated tool sets for required operations - Model configurations optimized for the workload

Output Artifacts¶

Converted Document: Clean markdown representation of the original PDF
Sensitive Data Catalog: Structured JSON with all detected sensitive information
Redacted Document: Sanitized version safe for broader distribution
Process Metrics: Detailed logging and usage statistics for each stage

This workflow enables automated, AI-powered document redaction with full traceability and structured output suitable for compliance and audit requirements.

Example Input and Output¶

Sample input and output artifacts can be found in the data directory of this repository.

Raw Contract	Converted Document	Sensitive Data Catalog	Redacted Document
Raw spielbank_rocketbase_dienstleistungsvertrag	Converted spielbank_rocketbase_dienstleistungsvertrag	Extracted spielbank_rocketbase_dienstleistungsvertrag	Redacted spielbank_rocketbase_dienstleistungsvertrag
Raw rocketbase_aws_agreement	Converted rocketbase_aws_agreement	Extracted rocketbase_aws_agreement	Redacted rocketbase_aws_agreement
Raw spielbank_rocketbase_vertrag	Converted spielbank_rocketbase_vertrag	Extracted spielbank_rocketbase_vertrag	Redacted spielbank_rocketbase_vertrag