Skip to content

AI Research Engineer (Model Evaluation - 100% remote Spain)

  • Remote
    • Madrid, Comunidad de Madrid, Spain
  • Data

Job description

Join Tether and Shape the Future of Digital Finance

At Tether, we’re not just building products, we’re pioneering a global financial revolution. Our cutting-edge solutions empower businesses—from exchanges and wallets to payment processors and ATMs—to seamlessly integrate reserve-backed tokens across blockchains. By harnessing the power of blockchain technology, Tether enables you to store, send, and receive digital tokens instantly, securely, and globally, all at a fraction of the cost. Transparency is the bedrock of everything we do, ensuring trust in every transaction.

Innovate with Tether

Tether Finance: Our innovative product suite features the world’s most trusted stablecoin, USDT, relied upon by hundreds of millions worldwide, alongside pioneering digital asset tokenization services.

But that’s just the beginning:

Tether Power: Driving sustainable growth, our energy solutions optimize excess power for Bitcoin mining using eco-friendly practices in state-of-the-art, geo-diverse facilities.

Tether Data: Fueling breakthroughs in AI and peer-to-peer technology, we reduce infrastructure costs and enhance global communications with cutting-edge solutions like KEET, our flagship app that redefines secure and private data sharing.

Tether Education: Democratizing access to top-tier digital learning, we empower individuals to thrive in the digital and gig economies, driving global growth and opportunity.

Tether Evolution: At the intersection of technology and human potential, we are pushing the boundaries of what is possible, crafting a future where innovation and human capabilities merge in powerful, unprecedented ways.

Why Join Us?

Our team is a global talent powerhouse, working remotely from every corner of the world. If you’re passionate about making a mark in the fintech space, this is your opportunity to collaborate with some of the brightest minds, pushing boundaries and setting new standards. We’ve grown fast, stayed lean, and secured our place as a leader in the industry.

If you have excellent English communication skills and are ready to contribute to the most innovative platform on the planet, Tether is the place for you.

Are you ready to be part of the future?

About the job

As a member of our AI model team, you will drive innovation across the entire AI lifecycle by developing and implementing rigorous evaluation frameworks and benchmark methodologies for pre-training, post-training, and inference. Your work will focus on designing metrics and assessment strategies that ensure our models are highly responsive, efficient, and reliable across real-world applications. You will work on a wide spectrum of systems, from resource-efficient models designed for limited hardware environments to complex, multi-modal architectures that integrate text, images, and audio.

We expect you to have deep expertise in advanced model architectures, pre-training and post-training practices, and inference evaluation frameworks. Adopting a hands-on, research-driven approach, you will develop, test, and implement novel evaluation strategies that rigorously track key performance indicators such as accuracy, latency, throughput, and memory footprint. Your evaluations will not only benchmark model performance at each stage, from the foundational pre-training phase to targeted post-training refinements and final inference but will also provide actionable insights.

A key element of this role is collaborating with cross-functional teams including product management, engineering, and operations to share your evaluation findings and integrate stakeholder feedback. You will engineer robust evaluation pipelines and performance dashboards that serve as a common reference point for all stakeholders, ensuring that the insights drive continuous improvement in model deployment strategies. The ultimate goal is to set industry-leading standards for AI model quality and reliability, delivering scalable performance and tangible value in dynamic, real-world scenarios.

Responsibilities:

  • Develop, test, and deploy integrated frameworks that rigorously assess models during pre-training, post-training, and inference. Define and track key performance indicators such as accuracy, loss metrics, latency, throughput, and memory footprint across diverse deployment scenarios.

  • Curate high-quality evaluation datasets and design standardized benchmarks to reliably measure model quality and robustness. Ensure that these benchmarks accurately reflect improvements achieved through both pre-training and post-training processes, and drive consistency in evaluation practices.

  • Engage with product management, engineering, data science, and operations teams to align evaluation metrics with business objectives. Present evaluation findings, actionable insights, and recommendations through comprehensive dashboards and reports that support decision-making across functions.

  • Systematically analyze evaluation data to identify and resolve bottlenecks across the model lifecycle. Propose and implement optimizations that enhance model performance, scalability, and resource utilization on resource-constrained platforms, ensuring efficient pre-training, post-training, and inference.

  • Conduct iterative experiments and empirical research to refine evaluation methodologies, staying abreast of emerging techniques and trends. Leverage insights to continuously enhance benchmarking practices and improve overall model reliability, ensuring that all stages of the model lifecycle deliver measurable value in real-world applications.

Job requirements

  • A degree in Computer Science or related field. Ideally PhD in NLP, Machine Learning, or a related field, complemented by a solid track record in AI R&D (with good publications in A* conferences).

  • Demonstrated experience in designing and evaluating AI models at multiple stages from pre-training, post-training, and inference. You should be proficient in developing evaluation frameworks that rigorously assess accuracy, convergence, loss improvements, and overall model robustness, ensuring each stage of the AI lifecycle delivers measurable real-world value.

  • Strong programming skills and hands-on expertise in evaluation benchmarks and frameworks are essential. Familiarity with building, automating, and scaling complex evaluation and benchmarking pipelines, and experience with performance metrics: latency, throughput, and memory footprint.

  • Proven ability to conduct iterative experiments and empirical research that drive the continuous refinement of evaluation methodologies. You should be adept at staying abreast of emerging trends and techniques, leveraging insights to enhance benchmarking practices and model reliability.

  • Demonstrated experience collaborating with diverse teams such as product, engineering, and operations in order to align evaluation strategies with organizational goals. You must be skilled at translating technical findings into actionable insights for stakeholders and driving process improvements across the model development lifecycle.

or