About me

Yintao Tai is a PhD student in the UKRI-funded CDT in Designing Responsible Natural Language Processing at the University of Edinburgh, where he is supervised by Prof. Frank Keller and Prof. Antonio Vergari. He received his BEng in Electronics and Computer Science and his MSc in Computer Science, also from Edinburgh.

Before starting his PhD, he worked as a machine learning engineer at ByteDance, where he developed multimodal methods that enhanced large-scale recommendation systems. This industry experience shaped his interest in building scalable and efficient multimodal models.

Research interests

Yintao’s research focuses on efficient multimodal learning, particularly enabling large language models to process and understand videos without relying on excessive computation. This line of work supports the deployment of multimodal systems at massive scale.

He is also interested in text-in-image understanding and developed PIXAR, a model capable of both recognizing and generating text embedded in images. His work advances practical multimodal methods for applications such as video summarisation, retrieval, and recommendation.

Beyond these areas, Yintao is exploring how efficient video LLMs can be applied to robotics. Alongside technical contributions, he is committed to addressing ethical and responsible AI challenges, including fairness, transparency, and the mitigation of potential societal risks when deploying multimodal systems.