OJT Parser

Overview
Getting Started
Input Data Requirements
How to Prompt the Agent
Example Usage
Best Practices
Troubleshooting
FAQ

Overview

The OJT Data Parser Agent is a specialized agent designed to extract and parse structured information from On-the-Job Training (OJT) documents. It focuses on extracting two main categories of information:

Project (OJT) Summary Information: Comprehensive metadata about the training program
Project (OJT) Curriculum Tasks: Detailed task breakdowns including steps, standards, and learning objectives

Key Capabilities

Extracts structured data from various document formats
Parses complex OJT task analysis tables
Generates CSV files for easy data import and manipulation
Displays extracted data in readable markdown table format
Maintains data integrity by only extracting existing information

Getting Started

Prerequisites

Access to OJT training documents (PDF, Word, or text formats)

Input Data Requirements

Document Format

The agent accepts documents containing:

Training Module Information
- Module title and descriptions
- Company and location details
- Personnel assignments (approver, assignee, lead)
- Timeline information
Task Analysis Tables
- Main task listings
- OJT hours allocations
- Task breakdown structures including:
  - Task steps
  - Key learning points
  - Standards and requirements
  - Required skills and knowledge
  - Training guidelines

Expected Structure

Documents should ideally contain:

Clear section headers (e.g., "MODULE", "MAIN TASKS", "OJT HOURS")
Structured tables for task analysis
Consistent formatting for dates and durations

How to Prompt the Agent

Basic Prompt Structure

Please extract OJT information from the uploaded document [filename]

Advanced Prompting Tips

Specific Extraction: Request focus on particular sections
```
Extract only the curriculum tasks from the OJT document
```

Multiple Documents: Process batch documents

Parse all OJT modules in the uploaded training materials

Validation Requests: Verify extraction completeness

Extract and validate all task analysis tables from the document

Example Usage

Example 1: Complete OJT Document Extraction

Input Document Content:

MODULE: Equipment Maintenance Training
COMPANY: TechCorp Industries
LOCATION: Singapore
APPROVER: John Smith
ASSIGNEE: Jane Doe
START DATE: 2024-01-15
END DATE: 2024-03-15

MAIN TASKS:
1. Preventive Maintenance - 40 hours
2. Troubleshooting - 60 hours

ON-THE-JOB TRAINING TASK ANALYSIS:
S/N || Main Task || Steps || Learning Points || Standards
1 || Preventive Maintenance || Check oil levels || Understanding viscosity || ISO 9001
2 || Troubleshooting || Diagnose issues || Problem identification || Company SOP

Expected Output:

Project Summary CSV with all metadata fields
Project Tasks CSV with formatted task descriptions
Markdown tables for immediate viewing

Example 2: Task-Focused Extraction

Prompt:

Extract only the curriculum tasks with their detailed breakdown from the OJT document

Output Focus:

Detailed task_description_table with proper CSV formatting
Preserved table structure with || delimiters
All columns and rows maintained

Best Practices

Document Preparation

Ensure Clear Structure: Use consistent headers and formatting
Complete Tables: Include all columns even if empty
Date Formatting: Use YYYY-MM-DD format for dates
Duration Units: Clearly specify hours, days, or weeks

Prompt Optimization

Be Specific: Clearly state what information you need extracted
Provide Context: Mention document type and expected content
Request Validation: Ask for confirmation of extracted fields

Data Handling

Review Output: Always verify extracted data against source
Check CSV Format: Ensure proper delimiter usage (semicolon)
Validate Completeness: Confirm all expected fields are present

Troubleshooting

Common Issues and Solutions

Issue 1: Missing Fields in Output

Symptom: Some expected fields are empty in the CSV Solution:

Verify the field exists in the source document
Check for alternative field names or variations
Ensure document quality is sufficient for parsing

Issue 2: Malformed Task Description Tables

Symptom: Task analysis tables are not properly formatted Solution:

Ensure source tables use consistent delimiters
Check for merged cells or irregular formatting
Manually clean the source document if necessary

Issue 3: CSV Export Fails

Symptom: Generated CSV files cannot be opened or imported Solution:

Verify delimiter consistency (semicolon)
Check for unescaped quotes in data
Ensure proper line endings

Issue 4: Incomplete Extraction

Symptom: Agent misses entire sections of the document Solution:

Break document into smaller sections
Ensure clear section headers
Remove complex formatting or graphics

FAQ

Q1: What document formats are supported?

A: The agent primarily works with text-based formats including PDF, Word documents, and plain text files. Complex layouts or image-heavy documents may require preprocessing.

Q2: Can the agent handle multiple OJT modules in one document?

A: Yes, the agent can extract multiple modules. Each will be listed as a separate row in the Project Summary CSV, with corresponding tasks in the Tasks CSV.

Q3: How does the agent handle missing information?

A: The agent leaves fields empty when information is not found. It does not generate or infer missing data, maintaining data integrity.

Q4: What is the maximum document size the agent can process?

A: While there's no strict limit, very large documents (>100 pages) may benefit from being split into sections for optimal processing.

Q5: Can I customize the output format?

A: The agent outputs in CSV format with semicolon delimiters and markdown tables. The CSV structure is fixed but can be post-processed as needed.

Q6: How accurate is the extraction?

A: Accuracy depends on document quality and structure. Well-formatted documents typically achieve 95%+ accuracy. Always review critical data.

Q7: Can the agent extract from scanned documents?

A: Scanned documents require OCR preprocessing. The agent works best with digital text rather than image-based content.

Q8: How are complex tables with merged cells handled?

A: The agent attempts to preserve table structure but merged cells may cause formatting issues. Consider reformatting complex tables before extraction.

Q9: Is there support for non-English documents?

A: The agent is optimized for English documents. Other languages may work but with reduced accuracy.

Q10: Can I extract specific date ranges or filter results?

A: The agent extracts all available data. Filtering should be done post-extraction using the CSV outputs.

OJT Parser

Table of Contents​

Overview​

Key Capabilities​

Getting Started​

Prerequisites​

Input Data Requirements​

Document Format​

Expected Structure​

How to Prompt the Agent​

Basic Prompt Structure​

Advanced Prompting Tips​

Example Usage​

Example 1: Complete OJT Document Extraction​

Example 2: Task-Focused Extraction​

Best Practices​

Document Preparation​

Prompt Optimization​

Data Handling​

Troubleshooting​

Common Issues and Solutions​

Issue 1: Missing Fields in Output​

Issue 2: Malformed Task Description Tables​

Issue 3: CSV Export Fails​

Issue 4: Incomplete Extraction​

FAQ​

Q1: What document formats are supported?​

Q2: Can the agent handle multiple OJT modules in one document?​

Q3: How does the agent handle missing information?​

Q4: What is the maximum document size the agent can process?​

Q5: Can I customize the output format?​

Q6: How accurate is the extraction?​

Q7: Can the agent extract from scanned documents?​

Q8: How are complex tables with merged cells handled?​

Q9: Is there support for non-English documents?​

Q10: Can I extract specific date ranges or filter results?​

Table of Contents