User Profile Parser
Table of Contents
- Overview
- Getting Started
- Input Data Requirements
- How to Prompt the Agent
- Example Usage
- Best Practices
- Troubleshooting
- FAQ
Overview
The User Data Parser Agent is a specialized extraction tool designed to parse and structure user information and personally identifiable information (PII) from various documents. It extracts comprehensive user profiles including personal details, professional information, skills, and work experience, outputting them in structured CSV format for HR systems and databases.
Key Capabilities
- Extracts multiple user profiles from single documents
- Parses detailed work experience history
- Maintains data integrity without inference
- Outputs two linked CSV files (summary and experience)
- Handles various document formats (resumes, CVs, profiles)
- Preserves original data without modification
Extracted Data Categories
- User Summary: Personal info, contact, education, skills
- Work Experience: Employment history with detailed role information
Getting Started
Prerequisites
- Documents containing user information
Privacy Considerations
When handling PII:
- Ensure compliance with data protection regulations
- Implement appropriate access controls
- Use secure storage for extracted data
- Follow organizational privacy policies
- Consider data anonymization needs
Input Data Requirements
Document Formats Supported
- Resumes/CVs: PDF, Word, text formats
- Employee Profiles: Structured personnel records
- Application Forms: Job or program applications
- Professional Portfolios: Comprehensive career documents
- LinkedIn Exports: Profile data exports
- HR Records: Employee information sheets
User Summary Fields (17 attributes)
Personal Information
- first_name
- last_name
- phone
- address
- date_of_birth (YYYY-MM-DD)
Professional Information
- job_title
- department
- company
- skills (comma-separated)
- certifications
Additional Information
- education
- languages
- interests
- projects
- references
- additional_info
Work Experience Fields (9 attributes)
- email (links to user)
- job_title
- company
- start_date (YYYY-MM-DD)
- end_date (YYYY-MM-DD)
- responsibilities
- achievements
- skills_used
- references
How to Prompt the Agent
Basic Prompt Structure
Extract user information from the uploaded document [filename]
Advanced Prompting Techniques
1. Single Resume Extraction
Parse the resume and extract complete user profile including
personal details, skills, and work experience
2. Batch Profile Processing
Extract all user profiles from the HR database export,
including complete work histories
3. Specific Field Focus
Focus on extracting professional information:
job titles, companies, skills, and certifications
4. Date-Specific Extraction
Extract user data ensuring all dates are in YYYY-MM-DD format,
particularly for work experience timeline
Example Usage
Example 1: Complete Resume Extraction
Input Document:
John Smith
Email: john.smith@email.com
Phone: +1-555-0123
Address: 123 Main St, New York, NY 10001
Date of Birth: 1985-06-15
EDUCATION:
MBA, Harvard Business School, 2010
BS Computer Science, MIT, 2007
SKILLS:
Python, Java, Machine Learning, Project Management, Data Analysis
WORK EXPERIENCE:
Senior Data Scientist | TechCorp Inc. | 2018-01-15 to Present
- Led ML model development for customer prediction
- Improved accuracy by 35% through feature engineering
- Managed team of 5 data scientists
Skills used: Python, TensorFlow, SQL, Leadership
Data Analyst | DataCo | 2015-06-01 to 2017-12-31
- Analyzed business metrics and created dashboards
- Automated reporting saving 20 hours weekly
Skills used: SQL, Tableau, Excel, Python
CERTIFICATIONS:
AWS Certified Solutions Architect
PMP Certification
LANGUAGES:
English (Native), Spanish (Fluent), Mandarin (Basic)
PROJECTS:
Customer Churn Prediction Model
Real-time Analytics Dashboard
Expected Output:
User Summary CSV:
"first_name";"last_name";"email";"phone";"address";"job_title";"department";"company";"date_of_birth";"education";"skills";"certifications";"languages";"interests";"projects";"references";"additional_info"
"John";"Smith";"john.smith@email.com";"+1-555-0123";"123 Main St, New York, NY 10001";"Senior Data Scientist";"";"TechCorp Inc.";"1985-06-15";"MBA, Harvard Business School, 2010\nBS Computer Science, MIT, 2007";"Python, Java, Machine Learning, Project Management, Data Analysis";"AWS Certified Solutions Architect\nPMP Certification";"English (Native), Spanish (Fluent), Mandarin (Basic)";"";""Customer Churn Prediction Model\nReal-time Analytics Dashboard";"";""
Work Experience CSV:
"email";"job_title";"company";"start_date";"end_date";"responsibilities";"achievements";"skills_used";"references"
"john.smith@email.com";"Senior Data Scientist";"TechCorp Inc.";"2018-01-15";"";"Led ML model development for customer prediction\nManaged team of 5 data scientists";"Improved accuracy by 35% through feature engineering";"Python, TensorFlow, SQL, Leadership";""
"john.smith@email.com";"Data Analyst";"DataCo";"2015-06-01";"2017-12-31";"Analyzed business metrics and created dashboards";"Automated reporting saving 20 hours weekly";"SQL, Tableau, Excel, Python";""
Example 2: Multiple User Profiles
Input Document:
EMPLOYEE RECORDS
1. Jane Doe
Email: jane.doe@company.com
Title: Marketing Manager
Department: Marketing
Skills: Digital Marketing, SEO, Content Strategy, Analytics
Experience:
- Marketing Manager, CurrentCo (2020-Present)
- Marketing Specialist, PreviousCo (2017-2020)
2. Bob Johnson
Email: bob.johnson@company.com
Title: Software Engineer
Department: Engineering
Skills: Java, Spring Boot, Microservices, AWS
Experience:
- Software Engineer, CurrentCo (2019-Present)
- Junior Developer, StartupXYZ (2016-2019)
Output:
- User Summary CSV with 2 rows (Jane and Bob)
- Work Experience CSV with 4 rows (2 positions each)
Example 3: Partial Information Handling
Input with Missing Fields:
Sarah Williams
Email: sarah.w@email.com
Current Role: Product Manager
Skills: Product Strategy, Agile, User Research
Education: Bachelor's Degree in Business
Work History:
Product Manager at TechStart (2021-Present)
- Launching new products
- Managing product roadmap
Output Behavior:
- Populated fields: first_name, email, job_title, skills, education
- Empty fields: last_name, phone, address, date_of_birth, etc.
- Work experience: Single row with available information
Best Practices
Document Preparation
- Consistent Formatting: Use clear sections and headers
- Complete Contact Info: Include all available contact details
- Date Formats: Use consistent date formatting
- Skill Lists: Clearly separate individual skills
- Work History: Include start/end dates for all positions
Data Quality
- Verify Extraction: Review parsed data for accuracy
- Check Relationships: Ensure email links work experiences correctly
- Date Validation: Confirm dates are in YYYY-MM-DD format
- Skill Parsing: Verify skills are properly comma-separated
- Multi-line Content: Check preservation of line breaks
Privacy and Compliance
- PII Handling: Follow data protection regulations
- Access Control: Limit access to extracted data
- Data Retention: Implement appropriate retention policies
- Anonymization: Consider removing PII when not needed
- Audit Trail: Maintain logs of data extraction activities
Post-Processing
- Data Validation: Check for required fields
- Duplicate Detection: Identify potential duplicate profiles
- Standardization: Normalize job titles and skills
- Integration: Prepare for HR system import
- Quality Checks: Verify email formats and phone numbers
Troubleshooting
Common Issues and Solutions
Issue 1: Missing User Information
Symptom: Expected fields are empty in output Solution:
- Verify information exists in source document
- Check for alternative field labels
- Ensure consistent section headers
- Review document structure for clarity
Issue 2: Work Experience Not Linked
Symptom: Work experience rows don't connect to users Solution:
- Ensure email addresses are consistent
- Check for email extraction accuracy
- Verify work sections are clearly associated with users
- Use email as primary key for linking
Issue 3: Date Format Issues
Symptom: Dates not in YYYY-MM-DD format Solution:
- Provide clear date formatting in source
- Check for ambiguous date representations
- Ensure consistent date patterns
- Consider preprocessing dates before extraction
Issue 4: Skills Not Properly Separated
Symptom: Skills appear as single string instead of list Solution:
- Use clear delimiters in source (commas, bullets)
- Avoid complex skill descriptions
- Separate skill categories clearly
- Review extraction for proper parsing
Issue 5: Multiple Users Mixed
Symptom: Information from different users combined Solution:
- Ensure clear separation between profiles
- Use distinct headers for each user
- Number or label user sections
- Avoid ambiguous content placement
FAQ
Q1: How many users can be extracted at once?
A: The agent can handle multiple users in a single document. For optimal performance, batch processing of 10-20 users is recommended.
Q2: What if some fields are missing?
A: The agent only extracts available information. Missing fields are left empty, maintaining data integrity without fabrication.
Q3: Can it handle different resume formats?
A: Yes, the agent adapts to various formats including chronological, functional, and combination resumes.
Q4: How are skills extracted and formatted?
A: Skills are extracted as comma-separated values. Complex skill descriptions may need preprocessing.
Q5: Can work experience span multiple companies?
A: Yes, each position is extracted as a separate row in the work experience CSV, linked by email.
Q6: Is date standardization automatic?
A: The agent attempts to standardize dates to YYYY-MM-DD format but works best with consistent source formatting.
Q7: How does it handle international formats?
A: The agent can process international documents but may need guidance on specific formats (e.g., date conventions).
Q8: Can it extract from scanned documents?
A: Scanned documents require OCR preprocessing. The agent works with text-based content.
Q9: What about data privacy compliance?
A: The agent extracts data as-is. Implementing privacy compliance is the responsibility of the user/organization.
Q10: Can fields be customized?
A: The field structure is fixed, but post-processing can reorganize data as needed.
Q11: How are references handled?
A: References are extracted as text. Detailed reference contact information should be structured clearly in the source.
Q12: Can it parse LinkedIn profiles?
A: Yes, if exported to a readable format (PDF, text). Direct LinkedIn HTML may require preprocessing.