OJT Parser
Table of Contents
- Overview
- Getting Started
- Input Data Requirements
- How to Prompt the Agent
- Example Usage
- Best Practices
- Troubleshooting
- FAQ
Overview
The OJT Data Parser Agent is a specialized agent designed to extract and parse structured information from On-the-Job Training (OJT) documents. It focuses on extracting two main categories of information:
- Project (OJT) Summary Information: Comprehensive metadata about the training program
- Project (OJT) Curriculum Tasks: Detailed task breakdowns including steps, standards, and learning objectives
Key Capabilities
- Extracts structured data from various document formats
- Parses complex OJT task analysis tables
- Generates CSV files for easy data import and manipulation
- Displays extracted data in readable markdown table format
- Maintains data integrity by only extracting existing information
Getting Started
Prerequisites
- Access to OJT training documents (PDF, Word, or text formats)
Input Data Requirements
Document Format
The agent accepts documents containing:
Training Module Information
- Module title and descriptions
- Company and location details
- Personnel assignments (approver, assignee, lead)
- Timeline information
Task Analysis Tables
- Main task listings
- OJT hours allocations
- Task breakdown structures including:
- Task steps
- Key learning points
- Standards and requirements
- Required skills and knowledge
- Training guidelines
Expected Structure
Documents should ideally contain:
- Clear section headers (e.g., "MODULE", "MAIN TASKS", "OJT HOURS")
- Structured tables for task analysis
- Consistent formatting for dates and durations
How to Prompt the Agent
Basic Prompt Structure
Please extract OJT information from the uploaded document [filename]
Advanced Prompting Tips
Specific Extraction: Request focus on particular sections
Extract only the curriculum tasks from the OJT documentMultiple Documents: Process batch documents
Parse all OJT modules in the uploaded training materialsValidation Requests: Verify extraction completeness
Extract and validate all task analysis tables from the document
Example Usage
Example 1: Complete OJT Document Extraction
Input Document Content:
MODULE: Equipment Maintenance Training
COMPANY: TechCorp Industries
LOCATION: Singapore
APPROVER: John Smith
ASSIGNEE: Jane Doe
START DATE: 2024-01-15
END DATE: 2024-03-15
MAIN TASKS:
1. Preventive Maintenance - 40 hours
2. Troubleshooting - 60 hours
ON-THE-JOB TRAINING TASK ANALYSIS:
S/N || Main Task || Steps || Learning Points || Standards
1 || Preventive Maintenance || Check oil levels || Understanding viscosity || ISO 9001
2 || Troubleshooting || Diagnose issues || Problem identification || Company SOP
Expected Output:
- Project Summary CSV with all metadata fields
- Project Tasks CSV with formatted task descriptions
- Markdown tables for immediate viewing
Example 2: Task-Focused Extraction
Prompt:
Extract only the curriculum tasks with their detailed breakdown from the OJT document
Output Focus:
- Detailed task_description_table with proper CSV formatting
- Preserved table structure with || delimiters
- All columns and rows maintained
Best Practices
Document Preparation
- Ensure Clear Structure: Use consistent headers and formatting
- Complete Tables: Include all columns even if empty
- Date Formatting: Use YYYY-MM-DD format for dates
- Duration Units: Clearly specify hours, days, or weeks
Prompt Optimization
- Be Specific: Clearly state what information you need extracted
- Provide Context: Mention document type and expected content
- Request Validation: Ask for confirmation of extracted fields
Data Handling
- Review Output: Always verify extracted data against source
- Check CSV Format: Ensure proper delimiter usage (semicolon)
- Validate Completeness: Confirm all expected fields are present
Troubleshooting
Common Issues and Solutions
Issue 1: Missing Fields in Output
Symptom: Some expected fields are empty in the CSV Solution:
- Verify the field exists in the source document
- Check for alternative field names or variations
- Ensure document quality is sufficient for parsing
Issue 2: Malformed Task Description Tables
Symptom: Task analysis tables are not properly formatted Solution:
- Ensure source tables use consistent delimiters
- Check for merged cells or irregular formatting
- Manually clean the source document if necessary
Issue 3: CSV Export Fails
Symptom: Generated CSV files cannot be opened or imported Solution:
- Verify delimiter consistency (semicolon)
- Check for unescaped quotes in data
- Ensure proper line endings
Issue 4: Incomplete Extraction
Symptom: Agent misses entire sections of the document Solution:
- Break document into smaller sections
- Ensure clear section headers
- Remove complex formatting or graphics
FAQ
Q1: What document formats are supported?
A: The agent primarily works with text-based formats including PDF, Word documents, and plain text files. Complex layouts or image-heavy documents may require preprocessing.
Q2: Can the agent handle multiple OJT modules in one document?
A: Yes, the agent can extract multiple modules. Each will be listed as a separate row in the Project Summary CSV, with corresponding tasks in the Tasks CSV.
Q3: How does the agent handle missing information?
A: The agent leaves fields empty when information is not found. It does not generate or infer missing data, maintaining data integrity.
Q4: What is the maximum document size the agent can process?
A: While there's no strict limit, very large documents (>100 pages) may benefit from being split into sections for optimal processing.
Q5: Can I customize the output format?
A: The agent outputs in CSV format with semicolon delimiters and markdown tables. The CSV structure is fixed but can be post-processed as needed.
Q6: How accurate is the extraction?
A: Accuracy depends on document quality and structure. Well-formatted documents typically achieve 95%+ accuracy. Always review critical data.
Q7: Can the agent extract from scanned documents?
A: Scanned documents require OCR preprocessing. The agent works best with digital text rather than image-based content.
Q8: How are complex tables with merged cells handled?
A: The agent attempts to preserve table structure but merged cells may cause formatting issues. Consider reformatting complex tables before extraction.
Q9: Is there support for non-English documents?
A: The agent is optimized for English documents. Other languages may work but with reduced accuracy.
Q10: Can I extract specific date ranges or filter results?
A: The agent extracts all available data. Filtering should be done post-extraction using the CSV outputs.