Prompt and skill optimization

First, let’s clean up the terminology around the multiple ways we can feed text to LLMs:

Term

Invocation

Should be optimized

Reusable

Notes

Prompt

ad-hoc

not always

usually no

Fundamental LLM concept

Command

/name

yes

yes

Technically a prompt with metadata

Skill

/name or autoloaded

yes

yes

Recent concept

Instruction

autoloaded, path-scoped

yes

yes

.cursorrules, AGENTS.md, CLAUDE.md, .github/copilot-instructions.md

Agent prompt

/name or autoloaded

yes

yes

Shapes agent persona and default behavior

To confidently operate all these, you can either write and optimize them manually or resort to metaprompting (see below).

Write

To boost your AI-focused writing, use these practical guides and specifications.

Prompts are the most fundamental part. Everything here applies elsewhere:

Skills are predefined reusable prompts that the agent invokes when it needs to (usually when the skill metadata advertise it as a match for the task at hand):

Instructions are repo-wide prompt files that provide extra details on the whole project or its parts:

There’s no single authority on how prompts, skills, or instructions should be written; anything goes that makes your combination of an LLM and an agent tick the way you want.

Optimize

Writing a prompt is one thing; ensuring it works as intended and cutting the waste, a.k.a. optimizing, is another.

Basic approaches optimize skills, instructions, and prompts by a set of predefined criteria (e.g., the initial specification for skills) but don’t actually evaluate the outcome:

Look out for the conjecture happening here:

If your Agent Skill content does not match the specification, agent platforms may not be able to use it. If it’s close, agent platforms may be able to use parts of it, but some types of spec noncompliance may prevent agent platforms from using your Agent Skills entirely.

Maybe they will, maybe they won’t. Who knows. LLMs don’t care about how closely you follow a spec.

More advanced approaches evaluate skills against a set of criteria at run time, using execution traces, scoring rubrics, and multiple evaluation methodologies:

However, this requires significantly larger effort.

Finally, there are plenty of opinionated approaches that optimize skills and instructions with AI based on some heuristics inferred from previous experience. Some of them even work.

Meta approaches

For a brief outline of meta-prompting, see this IBM article. The core idea is simple: let the LLM write a new prompt from scratch or optimize a human-written prompt. An example can be found in this blog post.

Some pieces of research and empirical evidence say (while some disagree) that AI-written/optimized prompts are generally not that bad and often at least as good as bespoke human-written prompts, but the devil is always in the details.

For one, any prompt that works well with a certain LLM may fail with another one. For two, the quality of an AI-written prompt generally depends on how specific and widely known the domain is; humans tend to outperform LLMs in highly specialized areas where domain expertise is not yet distilled into model weights.

That being said, surely there are meta-skills and prompts to build and improve other skills, instructions, and prompts.

It’s probably best to start with the Canonical meta-skills in copilot-collections:

There’s a myriad of other options out there, e.g.:

And so on.

Finally, Andrej Karpathy’s autoresearch pattern inspired a family of skills that optimize other skills automatically:

And so on.

However, be careful: automatic optimization may overfit the result to the evaluation criteria, causing brittleness in a wider range of scenarios.

Provider-specific resources

Copilot skills and per-repo instructions:

Claude skills and scoped instructions (rules); mind that Copilot auto-discovers .claude/skills/: