Member of Technical Staff, Large Generative Models
Company: Captions, LLC.
Location: New York
Posted on: May 20, 2025
Job Description:
Captions is the leading video AI company, building the future of
video creation. Over 10 million creators and businesses have used
Captions to create videos for social media, marketing, sales, and
more. We're on a mission to serve the next billion.We are a rapidly
growing team of ambitious, experienced, and devoted engineers,
researchers, designers, marketers, and operators based in NYC.
You'll join an early team and have an outsized impact on the
product and the company's culture.We're very fortunate to have some
the best investors and entrepreneurs backing us, including Index
Ventures (Series C lead), Kleiner Perkins (Series B lead), Sequoia
Capital (Series A and Seed co-lead), Andreessen Horowitz (Series A
and Seed co-lead), Uncommon Projects, Kevin Systrom, Mike Krieger,
Lenny Rachitsky, Antoine Martin, Julie Zhuo, Ben Rubin, Jaren
Glover, SVAngel, 20VC, Ludlow Ventures, Chapter One, and more.Check
out our and some other coverage:The Information: Fast Company: The
New York Times: Business Insider: Time: ** Please note that all of
our roles will require you to be in-person at our NYC HQ (located
in Union Square)We do not work with third-party recruiting
agencies, please do not contact us**About the role:Captions is
seeking an exceptional Research Engineer (MOTS) to advance the
state-of-the-art in large-scale multimodal video diffusion models.
You'll conduct novel research on generative modeling architectures,
develop new training techniques, and scale models to billions of
parameters. As a key member of our ML Research team, you'll work at
the cutting edge of multimodal generation while building systems
that enable natural, controllable video creation. We're already
training large-scale models with demonstrated product impact, and
we're excited to continue expanding the scope and capabilities of
our research.We're especially excited about pushing the boundaries
of audio-video generation, with a focus on realistic and
charismatic human behavior that enables natural storytelling and
creative iteration. Our models power creative tools used by
millions of creators, and we're tackling fundamental challenges in
how to generate compelling human motion, expression, and speech.Key
Responsibilities:Research & Architecture Development:
- Design and implement novel architectures for large-scale video
and multimodal diffusion models
- Develop new approaches to multimodal fusion, temporal modeling,
and video control
- Research temporal video editing techniques and controllable
generation
- Research and validate scaling laws for video generation
models
- Create new loss functions and training objectives for improved
generation quality
- Drive rapid experimentation with model architectures and
training strategies
- Validate research directly through product deployment and user
feedbackModel Training & Optimization:
- Train and optimize models at massive scale (10s-100s of
billions of parameters)
- Develop sophisticated distributed training approaches using
FSDP, DeepSpeed, Megatron-LM
- Design and implement model surgery techniques (pruning,
distillation, quantization)
- Create new approaches to memory optimization and training
efficiency
- Research techniques for improving training stability at
scale
- Conduct systematic empirical studies of architecture and
optimization choicesTechnical Innovation:
- Advance state-of-the-art in video model architecture design and
optimization
- Develop new approaches to temporal modeling for video
generation
- Create novel solutions for multimodal learning and cross-modal
alignment
- Research and implement new optimization techniques for
generative modeling and sampling
- Design and validate new evaluation metrics for generation
quality
- Systematically analyze and improve model behavior across
different regimesRequirements:Research Experience:
- Master's or PhD in Computer Science, Machine Learning, or
related field
- Track record of research contributions at top ML conferences
(NeurIPS, ICML, ICLR)
- Demonstrated experience implementing and improving upon
state-of-the-art architectures
- Deep expertise in generative modeling approaches (diffusion,
autoregressive, VAEs, etc.)
- Strong background in optimization techniques and loss function
design
- Experience with empirical scaling studies and systematic
architecture researchTechnical Expertise:
- Strong proficiency in modern deep learning tooling (PyTorch,
CUDA, Triton, FSDP, etc.)
- Experience training diffusion models with 10B+ parameters
- Experience with very large language models (200B+ parameters)
is a plus
- Deep understanding of attention, transformers, and modern
multimodal architectures
- Expertise in distributed training systems and model
parallelism
- Proven ability to implement and improve complex model
architectures
- Track record of systematic empirical research and rigorous
evaluationEngineering Capabilities:
- Ability to write clean, modular research code that scales
- Strong software engineering practices including testing and
code review
- Experience with rapid prototyping and experimental design
- Strong analytical skills for debugging model behavior and
training dynamics
- Facility with profiling and optimization tools
- Track record of bringing research ideas to production
- Experience maintaining high code quality in a research
environmentAbout the Team:You'll work directly alongside our
research and engineering teams in our NYC office. We've
intentionally built a culture where technical innovation and
research excellence are highly valued - your success will be
measured by your contributions to improving our models and
advancing the field, not by your ability to navigate politics.
We're a team that loves diving deep into complex technical problems
and emerging with practical breakthroughs.
- Our team values:
- Open technical discussions and collaboration
- Rapid iteration and practical solutions
- Deep technical expertise and continuous learning
- Direct impact on research and product outcomes
- What sets us apart:
- Opportunity to advance the state-of-the-art in video
generation
- Direct impact on products used by millions of creators
- Access to massive compute resources and diverse, large-scale
datasets
- Environment that values both research excellence and practical
impact
- Ability to validate research through direct product
feedbackBenefits:
- Comprehensive medical, dental, and vision plans
- 401K with employer match
- Commuter Benefits
- Catered lunch multiple days per week
- Dinner stipend every night if you're working late and want a
bite!
- Doordash DashPass subscription
- Health & Wellness Perks (Talkspace, Kindbody, One Medical
subscription, HealthAdvocate, Teladoc)
- Multiple team offsites per year with team events every
month
- Generous PTO policyCaptions provides equal employment
opportunities to all employees and applicants for employment and
prohibits discrimination and harassment of any type without regard
to race, color, religion, age, sex, national origin, disability
status, genetics, protected veteran status, sexual orientation,
gender identity or expression, or any other characteristic
protected by federal, state or local laws.Please note benefits
apply to full time employees only.
#J-18808-Ljbffr
Keywords: Captions, LLC., Hackensack , Member of Technical Staff, Large Generative Models, IT / Software / Systems , New York, New Jersey
Didn't find what you're looking for? Search again!
Loading more jobs...