Arbetsbeskrivning
Veritaz is a leading IT staffing solutions provider in Sweden, committed to advancing individual careers and aiding employers in securing the perfect talent fit.
With a proven track record of successful partnerships with top companies, we have rapidly grown our presence in the USA, Europe, and Sweden as a dependable and trusted resource within the IT industry.
Assignment Description:
We are looking for a Senior Traffic Analyst to join our dynamic team.
What you will work on:
- Develop and implement evaluation frameworks, scoring rubrics, and multidimensional assessment models to measure the quality of AI-generated and AI-translated content.
- Create, maintain, and refine datasets, scoring protocols, and language-specific guidelines to ensure accuracy, fluency, and cultural relevance.
- Collaborate with ML engineers and language experts to calibrate evaluators, analyze model errors, and improve assessment logic.
- Conduct prompt testing, model calibration, and scoring validation for Generative AI evaluation processes.
- Identify quality gaps, propose improvements, and contribute to continuous feedback loops that enhance the linguistic performance of AI platforms.
- Support the training of assessment models and contribute to documentation, process development, and adoption of best practices.
What you bring:
- Experience in language quality evaluation, applied linguistics, computational linguistics, or similar language-focused research environments.
- Hands-on experience with LLM evaluation, machine translation assessment, or annotation processes in multilingual contexts.
- Strong understanding of linguistic and translation assessment frameworks such as MQM, MetricX, or COMET, with experience creating scoring rubrics.
- Familiarity with Generative AI evaluation methods, including prompt testing, model calibration, and scoring validation.
- Experience collaborating with ML teams on data pipelines, annotation workflows, or fine-tuning of linguistic evaluators.
- Deep linguistic and cultural knowledge across multiple markets, with the ability to define and measure content quality for different contexts.
- Advantageous experience with programmatic QA using Python, YAML, gRPC, or rule-based validation.
- Advantageous knowledge of inter-rater reliability measurement methods such as Krippendorff’s alpha or Cohen’s kappa, and experience comparing human and AI-generated assessments
- Fluent in English and Swedish.