Model Fine-tuning: Fine-tune LLM/VLM/VLA foundation models to explore planning and decision-making capabilities for instrument tasks, forming verifiable technical roadmaps
Data & Evaluation System: Build training datasets, offline evaluation, and regression mechanisms to drive quantifiable iteration
Fine-tuning & Alignment: Hands-on experience with SFT and parameter-efficient fine-tuning to improve task success rate and stability
RL Practice: Design reward and feedback loops to advance policy optimization effectiveness in real-world tasks
Engineering Collaboration: Work with engineering teams to ensure models are deployable, monitorable, and iteratively improvable in production systems