AI Research Engineer (Model Evaluation - 100% remote Spain)

Remote
- Madrid, Comunidad de Madrid, Spain
Data

Job description

Join Tether and Shape the Future of Digital Finance

At Tether, we’re not just building products, we’re pioneering a global financial revolution. Our cutting-edge solutions empower businesses—from exchanges and wallets to payment processors and ATMs—to seamlessly integrate reserve-backed tokens across blockchains. By harnessing the power of blockchain technology, Tether enables you to store, send, and receive digital tokens instantly, securely, and globally, all at a fraction of the cost. Transparency is the bedrock of everything we do, ensuring trust in every transaction.

Innovate with Tether

Tether Finance: Our innovative product suite features the world’s most trusted stablecoin, USDT, relied upon by hundreds of millions worldwide, alongside pioneering digital asset tokenization services.

But that’s just the beginning:

Tether Power: Driving sustainable growth, our energy solutions optimize excess power for Bitcoin mining using eco-friendly practices in state-of-the-art, geo-diverse facilities.

Tether Data: Fueling breakthroughs in AI and peer-to-peer technology, we reduce infrastructure costs and enhance global communications with cutting-edge solutions like KEET, our flagship app that redefines secure and private data sharing.

Tether Education: Democratizing access to top-tier digital learning, we empower individuals to thrive in the digital and gig economies, driving global growth and opportunity.

Tether Evolution: At the intersection of technology and human potential, we are pushing the boundaries of what is possible, crafting a future where innovation and human capabilities merge in powerful, unprecedented ways.

Why Join Us?

Our team is a global talent powerhouse, working remotely from every corner of the world. If you’re passionate about making a mark in the fintech space, this is your opportunity to collaborate with some of the brightest minds, pushing boundaries and setting new standards. We’ve grown fast, stayed lean, and secured our place as a leader in the industry.

If you have excellent English communication skills and are ready to contribute to the most innovative platform on the planet, Tether is the place for you.

Are you ready to be part of the future?

About the job

As a member of our AI model team, you will drive innovation across the entire AI lifecycle by developing and implementing rigorous evaluation frameworks and benchmark methodologies for pre-training, post-training, and inference. Your work will focus on designing metrics and assessment strategies that ensure our models are highly responsive, efficient, and reliable across real-world applications. You will work on a wide spectrum of systems, from resource-efficient models designed for limited hardware environments to complex, multi-modal architectures that integrate text, images, and audio.

We expect you to have deep expertise in advanced model architectures, pre-training and post-training practices, and inference evaluation frameworks. Adopting a hands-on, research-driven approach, you will develop, test, and implement novel evaluation strategies that rigorously track key performance indicators such as accuracy, latency, throughput, and memory footprint. Your evaluations will not only benchmark model performance at each stage, from the foundational pre-training phase to targeted post-training refinements and final inference but will also provide actionable insights.

A key element of this role is collaborating with cross-functional teams including product management, engineering, and operations to share your evaluation findings and integrate stakeholder feedback. You will engineer robust evaluation pipelines and performance dashboards that serve as a common reference point for all stakeholders, ensuring that the insights drive continuous improvement in model deployment strategies. The ultimate goal is to set industry-leading standards for AI model quality and reliability, delivering scalable performance and tangible value in dynamic, real-world scenarios.

Responsibilities:

Develop, test, and deploy integrated frameworks that rigorously assess models during pre-training, post-training, and inference. Define and track key performance indicators such as accuracy, loss metrics, latency, throughput, and memory footprint across diverse deployment scenarios.
Curate high-quality evaluation datasets and design standardized benchmarks to reliably measure model quality and robustness. Ensure that these benchmarks accurately reflect improvements achieved through both pre-training and post-training processes, and drive consistency in evaluation practices.
Engage with product management, engineering, data science, and operations teams to align evaluation metrics with business objectives. Present evaluation findings, actionable insights, and recommendations through comprehensive dashboards and reports that support decision-making across functions.
Systematically analyze evaluation data to identify and resolve bottlenecks across the model lifecycle. Propose and implement optimizations that enhance model performance, scalability, and resource utilization on resource-constrained platforms, ensuring efficient pre-training, post-training, and inference.
Conduct iterative experiments and empirical research to refine evaluation methodologies, staying abreast of emerging techniques and trends. Leverage insights to continuously enhance benchmarking practices and improve overall model reliability, ensuring that all stages of the model lifecycle deliver measurable value in real-world applications.

Job requirements

A degree in Computer Science or related field. Ideally PhD in NLP, Machine Learning, or a related field, complemented by a solid track record in AI R&D (with good publications in A* conferences).
Demonstrated experience in designing and evaluating AI models at multiple stages from pre-training, post-training, and inference. You should be proficient in developing evaluation frameworks that rigorously assess accuracy, convergence, loss improvements, and overall model robustness, ensuring each stage of the AI lifecycle delivers measurable real-world value.
Strong programming skills and hands-on expertise in evaluation benchmarks and frameworks are essential. Familiarity with building, automating, and scaling complex evaluation and benchmarking pipelines, and experience with performance metrics: latency, throughput, and memory footprint.
Proven ability to conduct iterative experiments and empirical research that drive the continuous refinement of evaluation methodologies. You should be adept at staying abreast of emerging trends and techniques, leveraging insights to enhance benchmarking practices and model reliability.
Demonstrated experience collaborating with diverse teams such as product, engineering, and operations in order to align evaluation strategies with organizational goals. You must be skilled at translating technical findings into actionable insights for stakeholders and driving process improvements across the model development lifecycle.

My information

Fill out the information below

Full name

Email address

Phone number

CV or resume

Upload your CV or resume file

Upload a file or drag and drop hereAccepted files: PDF, DOC, DOCX, JPEG and PNG up to 50MB.

Cover letter

Upload your cover letter

Upload a file or drag and drop hereAccepted files: PDF, DOC, DOCX, JPEG and PNG up to 50MB.

Questions

Please fill in additional questions

Why have you applied for this position?

Describe your experience in developing and implementing comprehensive evaluation frameworks for both pre-training and post-training phases. What key metrics did you track, and how did these metrics influence subsequent model adjustments?

What innovative techniques or tools have you used to benchmark inference performance in models? Please provide concrete examples and measurable outcomes from your previous work.

Describe a project where you conducted iterative experiments to refine evaluation methodologies across the AI model lifecycle. What challenges did you face in aligning pre-training and post-training evaluations, and what improvements did you achieve in overall model performance and reliability?

Explain how you have collaborated with cross-functional teams (e.g., engineering, product management, and operations) to integrate evaluation insights into model development. How did your benchmarking and evaluation reports translate into actionable changes that enhanced the quality and deployment of AI models?

Have you published research on model evaluation, or related topics at top-tier conferences such as ICLR, NeurIPS, ICML, ACL, or IJCAI? Describe your key contributions in these publications, and include links to your papers.

When would you be available to start working with us?

Linkedin Profile

Should you be considered for the role, what country will you primarily be working from?

What's your expected annual salary for this role? (in USD)

USD

AI Research Engineer (Model Evaluation - 100% remote Spain)

Job description

Job requirements

All done!