Skill Parser

Overview
Getting Started
Input Data Requirements
How to Prompt the Agent
Example Usage
Best Practices
Troubleshooting
FAQ

Overview

The Skill Data Parser Agent is a specialized extraction tool designed to parse and structure skills-related information from various documents. It identifies and extracts comprehensive skill definitions, categorizations, and relationships, outputting them in a structured CSV format for easy integration into skill management systems.

Key Capabilities

Extracts multiple skills from a single document
Captures comprehensive skill metadata (10+ attributes)
Maintains data integrity without inferring missing information
Outputs structured CSV with semicolon delimiters
Supports batch skill extraction
Preserves original document wording and context

Extracted Skill Attributes

Title and description
Category and subcategory classification
Proficiency levels
Prerequisites and related skills
Industry applications
References and sources

Getting Started

Prerequisites

Document containing skill information

Input Data Requirements

Document Formats Supported

Text Documents: Plain text, markdown files
Structured Documents: Word documents, PDFs
Training Materials: Skill frameworks, competency models
Course Catalogs: Training course descriptions
Job Descriptions: Role-specific skill requirements

Expected Content Structure

Documents should contain:

Skill Identification
- Clear skill names or titles
- Skill descriptions or definitions
Skill Metadata (Optional but valuable)
- Categories or domains
- Proficiency level indicators
- Prerequisite requirements
- Related or complementary skills
Context Information
- Industry applications
- Use cases or examples
- Reference materials

Data Quality Requirements

Clear skill delineation (headers, bullets, sections)
Consistent terminology
Structured organization (preferred but not required)
Complete attribute descriptions where available

How to Prompt the Agent

Basic Prompt Structure

Extract skill information from the uploaded document [filename]

Advanced Prompting Techniques

1. Comprehensive Extraction

Parse all skills with complete metadata from the training framework document

2. Category-Focused Extraction

Extract technical skills and their relationships from the competency model

3. Industry-Specific Parsing

Identify and extract oil & gas industry skills with proficiency levels

4. Validation Request

Extract skills and verify all prerequisite relationships are captured

Example Usage

Example 1: Technical Skills Framework

Input Document Content:

SKILL: Python Programming
Description: Proficiency in Python for data analysis and automation
Category: Technical Skills
Subcategory: Programming Languages
Level: Intermediate to Advanced
Prerequisites: Basic programming concepts, Computer science fundamentals
Related Skills: Data Analysis, Machine Learning, Automation
Applications: Data science projects, Backend development, Scripting
Industry: Technology, Finance, Healthcare
References: Python.org documentation, Real Python tutorials

SKILL: Project Management
Description: Ability to plan, execute, and deliver projects successfully
Category: Management Skills
Subcategory: Leadership
Level: Advanced
Prerequisites: Communication skills, Time management
Related Skills: Risk Management, Stakeholder Management, Agile Methodologies
Applications: IT projects, Construction, Consulting
Industry: All industries
References: PMBOK Guide, Agile Manifesto

Expected Output CSV:

"title";"description";"category";"subcategory";"level";"prerequisites";"related_skills";"applications";"industry";"references"
"Python Programming";"Proficiency in Python for data analysis and automation";"Technical Skills";"Programming Languages";"Intermediate to Advanced";"Basic programming concepts, Computer science fundamentals";"Data Analysis, Machine Learning, Automation";"Data science projects, Backend development, Scripting";"Technology, Finance, Healthcare";"Python.org documentation, Real Python tutorials"
"Project Management";"Ability to plan, execute, and deliver projects successfully";"Management Skills";"Leadership";"Advanced";"Communication skills, Time management";"Risk Management, Stakeholder Management, Agile Methodologies";"IT projects, Construction, Consulting";"All industries";"PMBOK Guide, Agile Manifesto"

Example 2: Competency Model with Multiple Skills

Input Document:

Digital Marketing Competencies

1. Search Engine Optimization (SEO)
   - Category: Digital Marketing
   - Description: Optimizing web content for search engines
   - Level: Beginner to Expert
   - Prerequisites: Basic HTML, Content writing
   - Applications: Website optimization, Content strategy

2. Social Media Marketing
   - Category: Digital Marketing  
   - Description: Creating and managing social media campaigns
   - Level: Intermediate
   - Related Skills: Content Creation, Analytics, Community Management
   - Industry: Marketing, E-commerce, Entertainment

Agent Processing:

Identifies 2 distinct skills
Extracts available attributes for each
Leaves empty fields where data is not provided
Maintains original descriptions

Example 3: Industry-Specific Skills

Input:

Oil & Gas Technical Competencies

Drilling Operations
Description: Knowledge of drilling techniques, equipment, and safety procedures
Category: Upstream Operations
Subcategory: Drilling
Prerequisites: HSE Training, Mechanical Engineering basics
Related Skills: Well Control, Mud Engineering, Directional Drilling
Applications: Offshore drilling, Onshore operations, Well intervention
Industry: Oil & Gas
Level: Advanced
References: IADC Standards, API Specifications

Output Focus:

Complete extraction of industry-specific terminology
Preservation of technical references
Proper categorization within industry framework

Best Practices

Document Preparation

Structure Content: Use clear headers and sections for skills
Consistent Formatting: Maintain uniform attribute labeling
Complete Descriptions: Provide comprehensive skill information
Clear Delineation: Separate different skills distinctly

Extraction Optimization

Batch Processing: Group related skills in single documents
Attribute Consistency: Use standard attribute names
Reference Preservation: Include all source materials
Relationship Mapping: Clearly indicate skill connections

Data Quality Assurance

Verify Completeness: Check all skills are extracted
Validate Relationships: Confirm prerequisite and related skill links
Review Categories: Ensure proper classification
Check Formatting: Verify CSV structure and delimiters

Post-Processing

Data Cleaning: Remove any extraction artifacts
Standardization: Normalize skill names and categories
Deduplication: Identify and merge duplicate skills
Integration Prep: Format for target system import

Troubleshooting

Common Issues and Solutions

Issue 1: Incomplete Skill Extraction

Symptom: Some skills in document are not extracted Solution:

Ensure clear skill demarcation (headers, bullets)
Check for consistent skill naming patterns
Verify document formatting is parseable
Split complex documents into sections

Issue 2: Missing Attributes

Symptom: Expected skill attributes are empty Solution:

Confirm attributes exist in source document
Check for alternative attribute names
Ensure consistent attribute formatting
Review agent's extraction patterns

FAQ

Q1: How many skills can be extracted at once?

A: The agent can extract unlimited skills from a document, though very large documents (100+ skills) may benefit from batching for optimal performance.

Q2: What if my document doesn't have all skill attributes?

A: The agent only extracts available information. Missing attributes are left empty in the CSV, maintaining data integrity without fabrication.

Q3: Can the agent handle skills in table format?

A: Yes, the agent can parse skills from tables, lists, or narrative text. Tables often provide the clearest structure for extraction.

Q4: How does the agent handle multi-line descriptions?

A: Multi-line content is preserved using \n characters within the CSV fields, maintaining formatting while ensuring CSV validity.

Q5: Can I extract skills from job descriptions?

A: Yes, the agent can identify and extract skills mentioned in job descriptions, though the metadata may be limited compared to dedicated skill frameworks.

Q6: Is there support for skill taxonomies?

A: The agent preserves hierarchical relationships through category/subcategory fields. Complex taxonomies may require post-processing.

Q7: How accurate is the extraction process?

A: Accuracy depends on document structure and clarity. Well-formatted documents typically achieve 95%+ extraction accuracy.

Q8: Can the agent infer skill levels from context?

A: No, the agent only extracts explicitly stated information. It will not infer or generate skill levels or other attributes.

Q9: What about skills in multiple languages?

A: The agent is optimized for English. Mixed-language documents may have reduced extraction accuracy for non-English content.

Q10: Can I customize the CSV output format?

A: The CSV structure is fixed with semicolon delimiters and quoted fields. Post-processing can reformat as needed for specific systems.

Q11: How are duplicate skills handled?

A: Each skill instance is extracted as a separate row. Deduplication should be performed post-extraction based on your criteria.

Q12: Can the agent extract soft skills vs. technical skills?

A: Yes, as long as skills are clearly categorized in the source document, the agent will maintain these distinctions in the extraction.

Skill Parser

Table of Contents​

Overview​

Key Capabilities​

Extracted Skill Attributes​

Getting Started​

Prerequisites​

Input Data Requirements​

Document Formats Supported​

Expected Content Structure​

Data Quality Requirements​

How to Prompt the Agent​

Basic Prompt Structure​

Advanced Prompting Techniques​

1. Comprehensive Extraction​

2. Category-Focused Extraction​

3. Industry-Specific Parsing​

4. Validation Request​

Example Usage​

Example 1: Technical Skills Framework​

Example 2: Competency Model with Multiple Skills​

Example 3: Industry-Specific Skills​

Best Practices​

Document Preparation​

Extraction Optimization​

Data Quality Assurance​

Post-Processing​

Troubleshooting​

Common Issues and Solutions​

Issue 1: Incomplete Skill Extraction​

Issue 2: Missing Attributes​

FAQ​

Q1: How many skills can be extracted at once?​

Q2: What if my document doesn't have all skill attributes?​

Q3: Can the agent handle skills in table format?​

Q4: How does the agent handle multi-line descriptions?​

Q5: Can I extract skills from job descriptions?​

Q6: Is there support for skill taxonomies?​

Q7: How accurate is the extraction process?​

Q8: Can the agent infer skill levels from context?​

Q9: What about skills in multiple languages?​

Q10: Can I customize the CSV output format?​

Q11: How are duplicate skills handled?​

Q12: Can the agent extract soft skills vs. technical skills?​

Table of Contents