Agent 技能

ai-image-generation

Generate AI images with GPT-Image-2, FLUX, Gemini, Grok, Seedream, Reve and 50+ models via inference.sh CLI. Capabilities: text-to-image, im…

多模型生图 — FLUX/Gemini/Seedream 一键切换

海报漫画广告

COREvideo-gen

ai-video-generation

Generate AI videos with Google Veo, Seedance 2.0, HappyHorse, Wan, Grok and 40+ models via inference.sh CLI. Capabilities: text-to-video, im…

多模型生视频 — Veo/Seedance/Wan 一键切换

短剧广告数字人

algorithmic-art

Creating algorithmic art using p5.js with seeded randomness and interactive parameter exploration. Use this when users request creating art …

生成式海报底图 — 算法哲学驱动独一无二视觉

创意视觉海报漫画

CORE音频生成

audiocraft-audio-generation

PyTorch library for audio generation including text-to-music (MusicGen) and text-to-sound (AudioGen). Use when you need to generate music fr…

短剧 BGM — 按场景情绪一键生成主题旋律

MV广告短剧

CORE视觉理解

blip-2-vision-language

Vision-language pre-training framework bridging frozen image encoders and LLMs. Use when you need image captioning, visual question answerin…

演员资产打标 — 自动给演员图自然语言描述，方便检索

资产管理检索

CORE语音转写

byted-las-asr-pro

ASR / STT / speech recognition / voice recognition engine powered by Volcengine LAS. Transcribes and converts speech to text from audio and …

专业级转录 — 豆包 ASR Pro 高精度长音频转文字

字幕影视短剧

CORE音频生成

byted-las-audio-extract-and-split

Extracts audio tracks from video files and splits long audio into timed segments using Volcengine LAS. Audio extraction and separation from …

byted-las-video-edit

Extracts and clips video segments from long videos using natural language descriptions. AI-powered smart video editing, video trimming, and …

byted-las-video-inpaint

Removes unwanted visual elements from videos using AI-powered inpainting via Volcengine LAS. Video watermark removal, subtitle removal, logo…

水印/Logo去除 — 视频擦除不想要的元素

影视短剧后期

byted-mediakit-videoedit

AI Video Intelligent Editing Skill. Input video file paths (supports multiple), optional danmaku file paths, optional subtitle file paths, c…

弹幕字幕合成 — AI 视频剪辑+弹幕叠加一键出片

短剧影视广告

byted-mediakit-voiceover-editing

Volcano Engine AI MediaKit talking-head video editing Skill: a one-stop workflow from environment setup through media management, audio proc…

byted-music-generate

Generate music using Volcengine Imagination API. Supports vocal songs, instrumental BGM, and lyrics generation. Use when the user wants to c…

短剧 BGM — 按场景情绪一键生成背景音乐

MV广告短剧

COREvideo-gen

byted-seedance-video-generate

Generate videos using Seedance models. Invoke when user wants to create videos from text prompts, images, or reference materials.

短剧分镜动起来 — Seedance 图生视频让静态分镜变动态

短剧广告数字人

byted-seedream-image-generate

Generate high-quality images from text prompts using Volcano Engine Seedream models. Supports multiple artistic styles and aspect ratios. Us…

中文语义生图 — Seedream 原生中文 prompt 直出海报底图

海报漫画广告

COREspeech-gen

byted-text-to-speech

将文本合成为语音（TTS）。使用火山引擎豆包语音合成 API，支持流式合成、多种音色、语速/音调/音量调节、Markdown 过滤和 LaTeX 公式播报。当用户需要把文字转成语音、生成朗读音频、配音、旁白、播报，或提到「文字转语音」「TTS」「语音合成」「朗读」「配音」时使用本…

数字人配音 — 豆包 TTS 给角色卡生成专属声音

数字人配音字幕

CORE语音转写

byted-voice-to-text

语音转文字（ASR）。使用火山引擎 BigModel ASR 识别语音，包含极速版（≤2h/100MB 同步快速返回）和标准版（≤5h 异步识别）两种模式。支持飞书语音消息、本地音频文件及音频 URL。当收到语音消息或音频附件（.ogg/.mp3/.wav）时使用本技能。

字幕自动生成 — 豆包 ASR 多语种字幕直接挂到时间线

字幕短剧数字人

canvas-design

Create beautiful visual art in .png and .pdf documents using design philosophy. You should use this skill when the user asks to create a pos…

海报设计 — Agent 用设计哲学自动生成品牌级主视觉

海报插画广告

CORE视觉理解

clip

OpenAI's model connecting vision and language. Enables zero-shot image classification, image-text matching, and cross-modal retrieval. Train…

elevenlabs-tts

ElevenLabs text-to-speech with 22+ premium voices, multilingual support, and voice tuning via inference.sh CLI. Models: eleven_multilingual_…

数字人配音 — ElevenLabs 22+音色高质量语音

数字人配音MV

gsap

GSAP animation reference for HyperFrames. Covers gsap.to(), from(), fromTo(), easing, stagger, defaults, timelines (gsap.timeline(), positio…

动效设计 — GSAP timeline 编排复杂动画序列

动效广告漫画

CORE文案写作

humanizer-zh

AI 文案去机器味 — 让广告/短剧台词更自然

剧本文案字幕

hyperframes

Create video compositions, animations, title cards, overlays, captions, voiceovers, audio-reactive visuals, and scene transitions in HyperFr…

视频合成创作 — HTML 即视频，Agent 写 HTML 自动出片

内容创作广告短剧

CORE多模态

llava

Large Language and Vision Assistant. Enables visual instruction tuning and image-based conversations. Combines CLIP vision encoder with Vicu…

分镜审稿 — 让 Agent 看图给修改建议

创作助手审核

CORE资产库

pexels-media

Source royalty-free images and videos from Pexels API for design, placeholders, or content. Supports search, curated/popular content, collec…

remotion-best-practices

Best practices for Remotion - Video creation in React. Use this skill whenever you are dealing with Remotion code to obtain domain-specific …

React 视频创作 — Remotion 组件化视频生产线

内容创作广告短剧

CORE视觉理解

segment-anything-model

Foundation model for image segmentation with zero-shot transfer. Use when you need to segment any object in images using points, boxes, or m…

stable-diffusion-image-generation

State-of-the-art text-to-image generation with Stable Diffusion models via HuggingFace Diffusers. Use when generating images from text promp…

演员肖像生成 — 与演员工会的一致性面孔配合，跑出多机位定妆照

影视海报漫画

CORE语音转写

whisper

OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification…

字幕自动生成 — 多语种字幕直接挂到时间线节点

字幕短剧数字人

CORE文案写作

writing-skills

Use when creating new skills, editing existing skills, or verifying skills work before deployment

短剧剧本提纲 — 给 Agent 一个标准化的写作脚手架

剧本广告

youtube-clipper

参考片段下载 — 一键抓取参考镜头入素材库

短剧素材

🤖 中间圈 · Agent 编排

Agent 工作流编排、可观测、结构化输出

19 项

agent-browser

Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking…

网页素材抓取 — Agent 自主翻网页找参考

MID文案写作

brainstorming

You MUST use this before any creative work - creating features, building components, adding functionality, or modifying behavior. Explores u…

byted-byteplus-vod-video-enhancement

Upload video/audio media to BytePlus VOD (Video on Demand) storage, returning the Vid and playback references; supports both local file uplo…

视频增强 — 超分辨率/去噪/锐化提升画质

影视后期

MID音频生成

byted-las-audio-convert

Converts and transcodes audio file formats and encoding parameters using Volcengine LAS. Audio format conversion between wav, mp3, flac, m4a…

格式转换 — wav到mp3/flac 批量转码

音频处理短剧

MID视频工具

byted-las-video-resize

Resizes, scales, and adjusts video resolution and dimensions using GPU-accelerated NVENC encoding via Volcengine LAS. Video resizing, video …

byted-las-vlm-video

Analyzes and understands video content using Volcengine LAS Doubao vision-language models (VLM). Multimodal AI video analysis, video compreh…

视频内容理解 — 豆包 VLM 自动描述视频内容

审核创作助手

MID文案写作

copywriting

When the user wants to write, rewrite, or improve marketing copy for any page — including homepage, landing pages, pricing pages, feature pa…

营销文案 — Homepage/Landing/Pricing 多页面文案

广告文案

crewai-multi-agent

Multi-agent orchestration framework for autonomous AI collaboration. Use when building teams of specialized agents working together on compl…

多角色 Agent 编剧团 — 编剧/导演/剪辑分工跑流程

dspy

Build complex AI systems with declarative programming, optimize prompts automatically, create modular RAG systems and agents with DSPy - Sta…

提示词自动优化 — 把 prompt 当参数训练

MID音频生成

elevenlabs-music

ElevenLabs AI music generation - create original music from text prompts via inference.sh CLI. Capabilities: text-to-music, custom duration …

guidance

Control LLM output with regex and grammars, guarantee valid JSON/XML/code generation, enforce structured formats, and build multi-step workf…

可控生成模板 — 模板内插值 + 分支跳转

instructor

Extract structured data from LLM responses with Pydantic validation, retry failed extractions automatically, parse complex JSON with type sa…

结构化输出强约束 — 让 Agent 的分镜表稳定可解析

langchain

Framework for building LLM-powered applications with agents, chains, and RAG. Supports multiple providers (OpenAI, Anthropic, Google), 500+ …

Agent 工具链编排 — 串联模型调用、检索、外部 API

Agent OS工作流

MID可观测

langsmith-observability

LLM observability platform for tracing, evaluation, and monitoring. Use when debugging LLM applications, evaluating model outputs against da…

Agent 链路追踪 — 复盘一次成片调用了哪些模型

Agent OS调试

MID检索增强

llamaindex

Data framework for building LLM applications with RAG. Specializes in document ingestion (300+ connectors), indexing, and querying. Features…

剧本知识库 RAG — 给 Agent 喂入项目历史脚本

资产管理检索

outlines

Guarantee valid JSON/XML/code structure during generation, use Pydantic models for type-safe outputs, support local models (Transformers, vL…

JSON / 正则约束生成 — 保证下游节点能直接吃

MIDcontent-gen

pptx

Use this skill any time a .pptx file is involved in any way -- as input, output, or both. This includes: creating slide decks, pitch decks, …

品牌演示文稿 — PPTX 自动生成产品Pitch Deck

广告演示内容创作

MID视频工具

website-to-hyperframes

Capture a website and create a HyperFrames video from it. Use when: (1) a user provides a URL and wants a video, (2) someone says 'capture t…

网站宣传片 — 一键把落地页变成产品Tour视频

广告内容创作

MIDcontent-publish

xiaohongshu-mcp

小红书（RED/XHS）自动化助手。提供完整的小红书操作能力：登录、发布图文/视频、搜索笔记、浏览详情、点赞收藏评论、查看博主主页、内容策划。当用户提到小红书、红书、XHS、RED、发笔记、搜笔记、小红书运营等任何与小红书相关的操作时使用此 skill，即使用户没有明确说小红书但…