← Back to articles

How to run a Microsoft 365 Copilot pilot programme, a step-by-step guide

A well-designed Copilot pilot produces data you can use to make a decision. A poorly designed one produces anecdotes that justify whatever conclusion you already had.

What a pilot is actually for

Many organisations run a Copilot pilot not to generate data but to satisfy a governance requirement. A decision has already been made; the pilot is the process by which that decision becomes formally documented. This is understandable but wasteful. A properly structured pilot creates something more valuable: measurable evidence of what Copilot does to productivity in your specific organisational context, with your specific people and workflows. That evidence is worth having regardless of whether wider rollout was always going to happen.

The two outputs a good pilot should produce: first, a set of data points on adoption rate, time saving, and confidence change that can be presented to leadership; second, a small cohort of people who have genuinely changed how they work and can act as credible internal advocates for wider adoption. Both require nine weeks and a structured programme. Neither can be achieved with a two-week free-for-all.

Choosing the right pilot cohort

Cohort selection is the most consequential decision in a pilot design. The wrong cohort produces misleading data and a difficult advocacy problem. The right cohort produces useful data and a group of people who genuinely want to tell their colleagues what changed.

The ideal pilot cohort is 20–30 people. This is large enough to produce statistically meaningful aggregate data and small enough to manage without significant coordination overhead. Below 15 people, individual variation dominates the aggregate and you cannot draw confident conclusions. Above 40, the facilitator burden becomes significant and the pod dynamic that drives accountability becomes harder to maintain.

On composition: include people at different seniority levels. Senior participants provide advocacy value, when a head of department says the programme changed how they prepare for board meetings, that lands differently than when a junior analyst says the same thing. But senior-only cohorts underrepresent the volume use cases (email, document drafting, data analysis) where Copilot produces the most aggregate time saving. A mix of seniority produces a richer evidence base.

Critically: do not select based on enthusiasm for technology. The pilot is not a proof-of-concept with early adopters, those people were always going to succeed. The pilot should include a realistic cross-section of your workforce, including people who are sceptical or indifferent at the outset. Their outcomes are the ones that prove the programme works for a general population.

The one selection criterion that matters: a willing facilitator. The pilot needs one named person to run the weekly challenge cadence, post to the Teams channel, manage the leaderboard, and send the midweek nudge. This person needs one to two hours per week for nine weeks. If you cannot secure that commitment before the pilot starts, the pilot will not run properly.

Setting success criteria upfront

Define what success looks like before week one begins, not after week nine ends. Three success criteria are sufficient:

  1. Active adoption rate at week nine, what percentage of pilot participants are using Copilot at least three times per week by the end of the programme? A realistic target for a structured programme is 70–80%.
  2. Confidence delta, what is the average increase in self-reported Copilot confidence between the pre-programme baseline survey and the week-nine completion survey? A meaningful target is a 50% or greater increase.
  3. Time saving self-report, what is the average weekly time saving reported by participants at week nine? A conservative but credible target for a 50-person general knowledge worker cohort is 90 minutes per week per person.

Write these targets down. Share them with the pilot sponsor before the programme starts. Then report against them honestly at the end.

The nine-week structure

A nine-week pilot should not be a period of free exploration. It should be a structured progression from basic capability through advanced application. Each week has one challenge, posted on Monday, with submissions due by Friday. The challenge uses real work files and real work scenarios, not sample data or test environments.

The progression matters as much as the duration. Weeks one to three cover the foundations: effective prompting in Copilot Chat, email drafting and management in Outlook, and document summarisation in Word. Weeks four to six build depth: Excel data analysis, PowerPoint presentation creation, and Teams meeting intelligence and follow-up. Weeks seven to nine push toward advanced application: multi-step prompting, cross-application workflows, and an introduction to Copilot agent design for repetitive tasks.

Participants are grouped into pods of four to six people. A pod leaderboard updates weekly and is visible to all participants in the programme Teams channel. This creates the peer accountability dynamic that sustains engagement through weeks three to six, the period where novelty has worn off but habit has not yet formed.

What to measure and when

Three data collection points are sufficient for a well-evidenced pilot. For a fuller treatment of which metrics to track and why, see our article on Copilot adoption metrics that actually matter.

Pre-programme (week zero): Capture baseline Copilot confidence (1–10), current usage frequency, and primary task categories where participants think Copilot might help. Also pull the current active adoption rate from the Microsoft 365 admin centre for the pilot group if tenant-level reporting allows it.

Mid-programme (week five): A brief check-in survey covering challenge completion rate, current confidence, and any blockers. This is early enough to make adjustments if something is not working, a challenge that is too hard, a pod that is not engaging, a facilitator who needs support.

Post-programme (week nine): Full completion survey covering confidence, spontaneous use rate, time saving estimate, and a qualitative open question on what specifically changed. Pull active adoption rate from the admin centre again for comparison. Compile the results into a one-page summary for leadership.

Presenting results to leadership

The post-pilot leadership presentation should be short (one page or five slides) and should focus on three things: what changed (the data), what it means in financial terms (the ROI calculation), and what the recommendation is (structured wider rollout).

The most persuasive element of any pilot results presentation is not the aggregate data, it is a direct quote from a sceptical participant describing something specific that changed. Find two or three people who were not technology enthusiasts at the start of the pilot and ask them to describe in their own words what Copilot now does for them that it could not do at week one. Those accounts are more compelling to a leadership audience than any graph.

Finish with a clear ask: the cohort size for the next phase, the timeline, the cost, and the named owner. If the next step is a wider rollout, our guide on how to write a Copilot adoption business case gives you the five-element structure that gets approved. A leadership presentation that ends without a specific request produces positive feedback and no action. One that ends with a specific, costed proposal produces a decision.

The Copilot Bootcamp Kit Pilot tier (£497 ex VAT) gives you everything you need to run a structured nine-week pilot for up to 30 participants, the full challenge pack, facilitator guide, leaderboard tracker, and pre- and post-programme survey templates. No consultants. Set up in a weekend.

See the Pilot tier