Agent 技能
符合 Claude Code Skills 规范的精选能力清单,按「创意生产 → Agent 编排 → 训练部署」三圈层组织。每条都标注在 saker 影视城里的具体用法。
🎬 核心圈 · 创意生产
直接出片、出图、出声 — 创作主战场
ai-image-generation
Generate AI images with GPT-Image-2, FLUX, Gemini, Grok, Seedream, Reve and 50+ models via inference.sh CLI. Capabilities: text-to-image, im…
ai-video-generation
Generate AI videos with Google Veo, Seedance 2.0, HappyHorse, Wan, Grok and 40+ models via inference.sh CLI. Capabilities: text-to-video, im…
algorithmic-art
Creating algorithmic art using p5.js with seeded randomness and interactive parameter exploration. Use this when users request creating art …
audiocraft-audio-generation
PyTorch library for audio generation including text-to-music (MusicGen) and text-to-sound (AudioGen). Use when you need to generate music fr…
blip-2-vision-language
Vision-language pre-training framework bridging frozen image encoders and LLMs. Use when you need image captioning, visual question answerin…
byted-las-asr-pro
ASR / STT / speech recognition / voice recognition engine powered by Volcengine LAS. Transcribes and converts speech to text from audio and …
byted-las-audio-extract-and-split
Extracts audio tracks from video files and splits long audio into timed segments using Volcengine LAS. Audio extraction and separation from …
byted-las-video-edit
Extracts and clips video segments from long videos using natural language descriptions. AI-powered smart video editing, video trimming, and …
byted-las-video-inpaint
Removes unwanted visual elements from videos using AI-powered inpainting via Volcengine LAS. Video watermark removal, subtitle removal, logo…
byted-mediakit-videoedit
AI Video Intelligent Editing Skill. Input video file paths (supports multiple), optional danmaku file paths, optional subtitle file paths, c…
byted-mediakit-voiceover-editing
Volcano Engine AI MediaKit talking-head video editing Skill: a one-stop workflow from environment setup through media management, audio proc…
byted-music-generate
Generate music using Volcengine Imagination API. Supports vocal songs, instrumental BGM, and lyrics generation. Use when the user wants to c…
byted-seedance-video-generate
Generate videos using Seedance models. Invoke when user wants to create videos from text prompts, images, or reference materials.
byted-seedream-image-generate
Generate high-quality images from text prompts using Volcano Engine Seedream models. Supports multiple artistic styles and aspect ratios. Us…
byted-text-to-speech
将文本合成为语音(TTS)。使用火山引擎豆包语音合成 API,支持流式合成、多种音色、语速/音调/音量调节、Markdown 过滤和 LaTeX 公式播报。当用户需要把文字转成语音、生成朗读音频、配音、旁白、播报,或提到「文字转语音」「TTS」「语音合成」「朗读」「配音」时使用本…
byted-voice-to-text
语音转文字(ASR)。使用火山引擎 BigModel ASR 识别语音,包含极速版(≤2h/100MB 同步快速返回)和标准版(≤5h 异步识别)两种模式。支持飞书语音消息、本地音频文件及音频 URL。当收到语音消息或音频附件(.ogg/.mp3/.wav)时使用本技能。
canvas-design
Create beautiful visual art in .png and .pdf documents using design philosophy. You should use this skill when the user asks to create a pos…
clip
OpenAI's model connecting vision and language. Enables zero-shot image classification, image-text matching, and cross-modal retrieval. Train…
elevenlabs-tts
ElevenLabs text-to-speech with 22+ premium voices, multilingual support, and voice tuning via inference.sh CLI. Models: eleven_multilingual_…
gsap
GSAP animation reference for HyperFrames. Covers gsap.to(), from(), fromTo(), easing, stagger, defaults, timelines (gsap.timeline(), positio…
humanizer-zh
hyperframes
Create video compositions, animations, title cards, overlays, captions, voiceovers, audio-reactive visuals, and scene transitions in HyperFr…
llava
Large Language and Vision Assistant. Enables visual instruction tuning and image-based conversations. Combines CLIP vision encoder with Vicu…
pexels-media
Source royalty-free images and videos from Pexels API for design, placeholders, or content. Supports search, curated/popular content, collec…
remotion-best-practices
Best practices for Remotion - Video creation in React. Use this skill whenever you are dealing with Remotion code to obtain domain-specific …
segment-anything-model
Foundation model for image segmentation with zero-shot transfer. Use when you need to segment any object in images using points, boxes, or m…
stable-diffusion-image-generation
State-of-the-art text-to-image generation with Stable Diffusion models via HuggingFace Diffusers. Use when generating images from text promp…
whisper
OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification…
writing-skills
Use when creating new skills, editing existing skills, or verifying skills work before deployment
youtube-clipper
🤖 中间圈 · Agent 编排
Agent 工作流编排、可观测、结构化输出
agent-browser
Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking…
brainstorming
You MUST use this before any creative work - creating features, building components, adding functionality, or modifying behavior. Explores u…
byted-byteplus-vod-video-enhancement
Upload video/audio media to BytePlus VOD (Video on Demand) storage, returning the Vid and playback references; supports both local file uplo…
byted-las-audio-convert
Converts and transcodes audio file formats and encoding parameters using Volcengine LAS. Audio format conversion between wav, mp3, flac, m4a…
byted-las-video-resize
Resizes, scales, and adjusts video resolution and dimensions using GPU-accelerated NVENC encoding via Volcengine LAS. Video resizing, video …
byted-las-vlm-video
Analyzes and understands video content using Volcengine LAS Doubao vision-language models (VLM). Multimodal AI video analysis, video compreh…
copywriting
When the user wants to write, rewrite, or improve marketing copy for any page — including homepage, landing pages, pricing pages, feature pa…
crewai-multi-agent
Multi-agent orchestration framework for autonomous AI collaboration. Use when building teams of specialized agents working together on compl…
dspy
Build complex AI systems with declarative programming, optimize prompts automatically, create modular RAG systems and agents with DSPy - Sta…
elevenlabs-music
ElevenLabs AI music generation - create original music from text prompts via inference.sh CLI. Capabilities: text-to-music, custom duration …
guidance
Control LLM output with regex and grammars, guarantee valid JSON/XML/code generation, enforce structured formats, and build multi-step workf…
instructor
Extract structured data from LLM responses with Pydantic validation, retry failed extractions automatically, parse complex JSON with type sa…
langchain
Framework for building LLM-powered applications with agents, chains, and RAG. Supports multiple providers (OpenAI, Anthropic, Google), 500+ …
langsmith-observability
LLM observability platform for tracing, evaluation, and monitoring. Use when debugging LLM applications, evaluating model outputs against da…
llamaindex
Data framework for building LLM applications with RAG. Specializes in document ingestion (300+ connectors), indexing, and querying. Features…
outlines
Guarantee valid JSON/XML/code structure during generation, use Pydantic models for type-safe outputs, support local models (Transformers, vL…
pptx
Use this skill any time a .pptx file is involved in any way -- as input, output, or both. This includes: creating slide decks, pitch decks, …
website-to-hyperframes
Capture a website and create a HyperFrames video from it. Use when: (1) a user provides a URL and wants a video, (2) someone says 'capture t…
xiaohongshu-mcp
小红书(RED/XHS)自动化助手。提供完整的小红书操作能力:登录、发布图文/视频、搜索笔记、浏览详情、点赞收藏评论、查看博主主页、内容策划。当用户提到小红书、红书、XHS、RED、发笔记、搜笔记、小红书运营等任何与小红书相关的操作时使用此 skill,即使用户没有明确说小红书但…