User Profile Parser

Overview
Getting Started
Input Data Requirements
How to Prompt the Agent
Example Usage
Best Practices
Troubleshooting
FAQ

Overview

The User Data Parser Agent is a specialized extraction tool designed to parse and structure user information and personally identifiable information (PII) from various documents. It extracts comprehensive user profiles including personal details, professional information, skills, and work experience, outputting them in structured CSV format for HR systems and databases.

Key Capabilities

Extracts multiple user profiles from single documents
Parses detailed work experience history
Maintains data integrity without inference
Outputs two linked CSV files (summary and experience)
Handles various document formats (resumes, CVs, profiles)
Preserves original data without modification

Extracted Data Categories

User Summary: Personal info, contact, education, skills
Work Experience: Employment history with detailed role information

Getting Started

Prerequisites

Documents containing user information

Privacy Considerations

When handling PII:

Ensure compliance with data protection regulations
Implement appropriate access controls
Use secure storage for extracted data
Follow organizational privacy policies
Consider data anonymization needs

Input Data Requirements

Document Formats Supported

Resumes/CVs: PDF, Word, text formats
Employee Profiles: Structured personnel records
Application Forms: Job or program applications
Professional Portfolios: Comprehensive career documents
LinkedIn Exports: Profile data exports
HR Records: Employee information sheets

User Summary Fields (17 attributes)

Personal Information
- first_name
- last_name
- email
- phone
- address
- date_of_birth (YYYY-MM-DD)
Professional Information
- job_title
- department
- company
- skills (comma-separated)
- certifications
Additional Information
- education
- languages
- interests
- projects
- references
- additional_info

Work Experience Fields (9 attributes)

email (links to user)
job_title
company
start_date (YYYY-MM-DD)
end_date (YYYY-MM-DD)
responsibilities
achievements
skills_used
references

How to Prompt the Agent

Basic Prompt Structure

Extract user information from the uploaded document [filename]

Advanced Prompting Techniques

1. Single Resume Extraction

Parse the resume and extract complete user profile including 
personal details, skills, and work experience

2. Batch Profile Processing

Extract all user profiles from the HR database export, 
including complete work histories

3. Specific Field Focus

Focus on extracting professional information: 
job titles, companies, skills, and certifications

4. Date-Specific Extraction

Extract user data ensuring all dates are in YYYY-MM-DD format, 
particularly for work experience timeline

Example Usage

Example 1: Complete Resume Extraction

Input Document:

John Smith
Email: john.smith@email.com
Phone: +1-555-0123
Address: 123 Main St, New York, NY 10001
Date of Birth: 1985-06-15

EDUCATION:
MBA, Harvard Business School, 2010
BS Computer Science, MIT, 2007

SKILLS:
Python, Java, Machine Learning, Project Management, Data Analysis

WORK EXPERIENCE:
Senior Data Scientist | TechCorp Inc. | 2018-01-15 to Present
- Led ML model development for customer prediction
- Improved accuracy by 35% through feature engineering
- Managed team of 5 data scientists
Skills used: Python, TensorFlow, SQL, Leadership

Data Analyst | DataCo | 2015-06-01 to 2017-12-31
- Analyzed business metrics and created dashboards
- Automated reporting saving 20 hours weekly
Skills used: SQL, Tableau, Excel, Python

CERTIFICATIONS:
AWS Certified Solutions Architect
PMP Certification

LANGUAGES:
English (Native), Spanish (Fluent), Mandarin (Basic)

PROJECTS:
Customer Churn Prediction Model
Real-time Analytics Dashboard

Expected Output:

User Summary CSV:

"first_name";"last_name";"email";"phone";"address";"job_title";"department";"company";"date_of_birth";"education";"skills";"certifications";"languages";"interests";"projects";"references";"additional_info"
"John";"Smith";"john.smith@email.com";"+1-555-0123";"123 Main St, New York, NY 10001";"Senior Data Scientist";"";"TechCorp Inc.";"1985-06-15";"MBA, Harvard Business School, 2010\nBS Computer Science, MIT, 2007";"Python, Java, Machine Learning, Project Management, Data Analysis";"AWS Certified Solutions Architect\nPMP Certification";"English (Native), Spanish (Fluent), Mandarin (Basic)";"";""Customer Churn Prediction Model\nReal-time Analytics Dashboard";"";""

Work Experience CSV:

"email";"job_title";"company";"start_date";"end_date";"responsibilities";"achievements";"skills_used";"references"
"john.smith@email.com";"Senior Data Scientist";"TechCorp Inc.";"2018-01-15";"";"Led ML model development for customer prediction\nManaged team of 5 data scientists";"Improved accuracy by 35% through feature engineering";"Python, TensorFlow, SQL, Leadership";""
"john.smith@email.com";"Data Analyst";"DataCo";"2015-06-01";"2017-12-31";"Analyzed business metrics and created dashboards";"Automated reporting saving 20 hours weekly";"SQL, Tableau, Excel, Python";""

Example 2: Multiple User Profiles

Input Document:

EMPLOYEE RECORDS

1. Jane Doe
   Email: jane.doe@company.com
   Title: Marketing Manager
   Department: Marketing
   Skills: Digital Marketing, SEO, Content Strategy, Analytics
   
   Experience:
   - Marketing Manager, CurrentCo (2020-Present)
   - Marketing Specialist, PreviousCo (2017-2020)

2. Bob Johnson
   Email: bob.johnson@company.com
   Title: Software Engineer
   Department: Engineering
   Skills: Java, Spring Boot, Microservices, AWS
   
   Experience:
   - Software Engineer, CurrentCo (2019-Present)
   - Junior Developer, StartupXYZ (2016-2019)

Output:

User Summary CSV with 2 rows (Jane and Bob)
Work Experience CSV with 4 rows (2 positions each)

Example 3: Partial Information Handling

Input with Missing Fields:

Sarah Williams
Email: sarah.w@email.com
Current Role: Product Manager

Skills: Product Strategy, Agile, User Research
Education: Bachelor's Degree in Business

Work History:
Product Manager at TechStart (2021-Present)
- Launching new products
- Managing product roadmap

Output Behavior:

Populated fields: first_name, email, job_title, skills, education
Empty fields: last_name, phone, address, date_of_birth, etc.
Work experience: Single row with available information

Best Practices

Document Preparation

Consistent Formatting: Use clear sections and headers
Complete Contact Info: Include all available contact details
Date Formats: Use consistent date formatting
Skill Lists: Clearly separate individual skills
Work History: Include start/end dates for all positions

Data Quality

Verify Extraction: Review parsed data for accuracy
Check Relationships: Ensure email links work experiences correctly
Date Validation: Confirm dates are in YYYY-MM-DD format
Skill Parsing: Verify skills are properly comma-separated
Multi-line Content: Check preservation of line breaks

Privacy and Compliance

PII Handling: Follow data protection regulations
Access Control: Limit access to extracted data
Data Retention: Implement appropriate retention policies
Anonymization: Consider removing PII when not needed
Audit Trail: Maintain logs of data extraction activities

Post-Processing

Data Validation: Check for required fields
Duplicate Detection: Identify potential duplicate profiles
Standardization: Normalize job titles and skills
Integration: Prepare for HR system import
Quality Checks: Verify email formats and phone numbers

Troubleshooting

Common Issues and Solutions

Issue 1: Missing User Information

Symptom: Expected fields are empty in output Solution:

Verify information exists in source document
Check for alternative field labels
Ensure consistent section headers
Review document structure for clarity

Issue 2: Work Experience Not Linked

Symptom: Work experience rows don't connect to users Solution:

Ensure email addresses are consistent
Check for email extraction accuracy
Verify work sections are clearly associated with users
Use email as primary key for linking

Issue 3: Date Format Issues

Symptom: Dates not in YYYY-MM-DD format Solution:

Provide clear date formatting in source
Check for ambiguous date representations
Ensure consistent date patterns
Consider preprocessing dates before extraction

Issue 4: Skills Not Properly Separated

Symptom: Skills appear as single string instead of list Solution:

Use clear delimiters in source (commas, bullets)
Avoid complex skill descriptions
Separate skill categories clearly
Review extraction for proper parsing

Issue 5: Multiple Users Mixed

Symptom: Information from different users combined Solution:

Ensure clear separation between profiles
Use distinct headers for each user
Number or label user sections
Avoid ambiguous content placement

FAQ

Q1: How many users can be extracted at once?

A: The agent can handle multiple users in a single document. For optimal performance, batch processing of 10-20 users is recommended.

Q2: What if some fields are missing?

A: The agent only extracts available information. Missing fields are left empty, maintaining data integrity without fabrication.

Q3: Can it handle different resume formats?

A: Yes, the agent adapts to various formats including chronological, functional, and combination resumes.

Q4: How are skills extracted and formatted?

A: Skills are extracted as comma-separated values. Complex skill descriptions may need preprocessing.

Q5: Can work experience span multiple companies?

A: Yes, each position is extracted as a separate row in the work experience CSV, linked by email.

Q6: Is date standardization automatic?

A: The agent attempts to standardize dates to YYYY-MM-DD format but works best with consistent source formatting.

Q7: How does it handle international formats?

A: The agent can process international documents but may need guidance on specific formats (e.g., date conventions).

Q8: Can it extract from scanned documents?

A: Scanned documents require OCR preprocessing. The agent works with text-based content.

Q9: What about data privacy compliance?

A: The agent extracts data as-is. Implementing privacy compliance is the responsibility of the user/organization.

Q10: Can fields be customized?

A: The field structure is fixed, but post-processing can reorganize data as needed.

Q11: How are references handled?

A: References are extracted as text. Detailed reference contact information should be structured clearly in the source.

Q12: Can it parse LinkedIn profiles?

A: Yes, if exported to a readable format (PDF, text). Direct LinkedIn HTML may require preprocessing.

User Profile Parser

Table of Contents​

Overview​

Key Capabilities​

Extracted Data Categories​

Getting Started​

Prerequisites​

Privacy Considerations​

Input Data Requirements​

Document Formats Supported​

User Summary Fields (17 attributes)​

Work Experience Fields (9 attributes)​

How to Prompt the Agent​

Basic Prompt Structure​

Advanced Prompting Techniques​

1. Single Resume Extraction​

2. Batch Profile Processing​

3. Specific Field Focus​

4. Date-Specific Extraction​

Example Usage​

Example 1: Complete Resume Extraction​

Example 2: Multiple User Profiles​

Example 3: Partial Information Handling​

Best Practices​

Document Preparation​

Data Quality​

Privacy and Compliance​

Post-Processing​

Troubleshooting​

Common Issues and Solutions​

Issue 1: Missing User Information​

Issue 2: Work Experience Not Linked​

Issue 3: Date Format Issues​

Issue 4: Skills Not Properly Separated​

Issue 5: Multiple Users Mixed​

FAQ​

Q1: How many users can be extracted at once?​

Q2: What if some fields are missing?​

Q3: Can it handle different resume formats?​

Q4: How are skills extracted and formatted?​

Q5: Can work experience span multiple companies?​

Q6: Is date standardization automatic?​

Q7: How does it handle international formats?​

Q8: Can it extract from scanned documents?​

Q9: What about data privacy compliance?​

Q10: Can fields be customized?​

Q11: How are references handled?​

Q12: Can it parse LinkedIn profiles?​

Table of Contents