Levelset · Methodology

01 · The thesis

Why five and not seven

Most transformations fail on the human side.

The rubric started from a single observation. In the post-mortems of programs that missed go-live, the explanation almost never traces to the platform. It traces to sponsorship that drifted, to a change-management workstream that was never elevated, to a workforce that was trained the month before cutover, to a communications cadence that left rumor to fill the silence, and to a steering committee that re-litigated the same design decisions every month. The technical work was usually fine. The organizational work was treated as scaffolding around the real program.

Five pillars is the smallest set that distinguishes those failure modes from each other. Fewer collapses the categories into a single "soft side" bucket that gives no leverage for action. More dilutes the rubric into adjacent disciplines (data architecture, vendor management, financial modeling) where the failure mode is different and the rubric we use here would mis-grade.

The pillars are weighted equally. We have tried weighting Sponsorship and Change higher, on the grounds that those failure modes are the most common, and found that the recommendation set gets worse, not better. An equal weighting forces the analyst to look at each pillar on its own evidence rather than nudging the score toward what the rubric already weighs.

02 · The rubric

5 pillars · 22 sub-criteria

What each pillar grades, and what it doesn't.

Enterprise Transformation Strategy

Whether the program has a strategy that the operating groups understand, or an implementation plan that IT is executing on. 5 sub-criteria.

Grades

▴Whether a board-approved transformation vision exists with measurable outcomes
▴How decisions escalate between working committee and steering, and how fast
▴Whether the SI was selected with operating-group involvement or by IT alone
▴Whether business outcomes and KPIs are documented and owned
▴Whether the program lead has authority commensurate with the political work

Does not grade

▾Whether the technology choice is correct
▾Whether the business case math holds up
▾Vendor commercial terms or SOW quality

Sample probes

"Who has the authority to change the design when the SI pushes back?"
"The last time a contentious design decision came up, where did it go to die?"
"What are the three KPIs your CFO will judge this program against in 18 months?"

Organizational Change Management

Whether the people doing the work have been brought into the change or had it sent to them. Where OCM reports. How resistance is being identified. 4 sub-criteria.

Grades

▴Where the OCM function reports and whether that reflects business priority
▴Whether stakeholder impact has been mapped, not just listed
▴How resistance is being detected and tracked at the team level
▴Whether adoption is being measured against go-live, not training completion

Does not grade

▾Quality of individual change manager hires
▾Whether a Prosci methodology is being followed
▾Cultural assessments or engagement-survey results

Sample probes

"Walk me through how your operating groups first learned this program was happening."
"Who does your OCM lead report to, and do they sit on the steering committee?"
"How will you know in week three of go-live that adoption is in trouble?"

Strategic Communications and Executive Messaging

Whether the program tells a coherent story to the people whose job it changes. CEO message, cadence, and translation across operating groups. 4 sub-criteria.

Grades

▴Whether the CEO has delivered a substantive opening message
▴Monthly cadence and whether milestones land on it
▴Audience segmentation and translation through operating-group leaders
▴Whether bad news travels as fast as good news

Does not grade

▾Brand and design quality of program materials
▾Open rates or other vanity engagement metrics
▾External communications and analyst-relations work

Sample probes

"What is the next communication going out, and what does it say?"
"The last time the program slipped, who heard about it first and how?"
"Who is responsible for translating the CEO's message into the language of your warehouse supervisors?"

Training and Enablement

Whether training is role-based and workflow-oriented or system-click. Whether SMEs and super users have time blocked to co-design. 4 sub-criteria.

Grades

▴Whether training philosophy is role-based or system-click
▴SME resource plan with backfill commitments from middle managers
▴Sustainment plan for new hires and role changes post go-live
▴Whether super users are co-designing or just attending

Does not grade

▾LMS platform choice
▾Video production quality
▾Course completion percentages

Sample probes

"If we pull your best supply-chain planner for six months, who is doing their day job?"
"What is the ratio of training delivery time to the complexity of what's being implemented?"
"In six months when someone new joins, how do they get trained?"

Transformation Recovery and Reset Capability

Whether the program can detect drift and recover from setbacks. Leading indicators, course-correction history, and the rhythm of contentious decisions. 5 sub-criteria.

Grades

▴Whether leading indicators of drift are defined and tracked
▴Whether the program has executed a meaningful mid-course correction
▴Whether bad news flows up faster than good news
▴Decision velocity at the working committee
▴Whether the program has a documented reset plan if go-live slips

Does not grade

▾Contingency budget size
▾Risk register completeness as a paper artifact
▾Disaster recovery and technical resilience

Sample probes

"What would it take to reset the program if a go-live needed to be postponed by 90 days?"
"When was the last time a decision was made at the working committee and not relitigated at steering?"
"What's the indicator you check on Monday mornings to know if the program is on track?"

03 · The anchor scale

One sub-criterion, all five anchors

What "3 out of 5" actually means.

Every sub-criterion has anchor descriptions for each of the five score levels. This is what keeps the rubric defensible: there is a definition of what "3" looks like, not just a gestalt. Below is one example sub-criterion, drawn from Pillar 01.

Pillar 01 · Sub-criterion 03Current edition

Decision rights at the working committee

The question is whether the working committee has the authority and the rhythm to resolve contentious design decisions on its own, or whether everything routes to steering by default.

Critical risk.

No working committee charter exists. Every contentious design decision escalates to the CEO or CFO. Steering is in tactical conversations regularly. Decisions take 2 to 4 weeks. The program is governance-bottlenecked.

Significant risk.

A charter exists on paper but is not followed in practice. Steering routinely re-litigates working committee decisions. Decisions land in 1 to 2 weeks with rework. Operating-group reps escalate around the committee chair rather than through them.

Mixed health.

Charter exists and is mostly followed but lacks explicit tie-breaking authority. Most decisions land in 5 to 10 business days. Contentious decisions still escalate to steering. The committee chair is leaning on personal credibility, not structure.

Generally healthy.

Charter is clear, the chair has tie-breaking authority, and escalation criteria are explicit and documented. Decisions land within 5 business days. Steering's calendar shows them on strategic items only.

Strength.

Charter is operating. The committee has resolved at least two real disputes without escalation. Decision velocity is measured and reported. The chair is teaching the structure to peer committees in adjacent programs.

Every other sub-criterion in the rubric is anchored the same way. The full document, all sub-criteria with their anchor scales, is available on request and shipped with every Enterprise tier engagement.

04 · How scoring works

From sub-criterion to overall health

A 3 is not passing.

A pillar score is the rounded mean of its sub-criterion scores. The overall program score is the rounded mean of the five pillar scores. There is no weighting hidden inside the formula.

A 3 means mixed health. It is not a passing grade. It is identifiable risk that will compound if not addressed, and the report's roadmap is sequenced to address it. In our calibration work, programs scoring 4 or above before go-live were consistently more likely to go live on time than those scoring 3 or below.

The diagnostic also reports a separate confidence score per pillar, 1 to 5, that captures how much signal the interview surfaced for that pillar. A pillar with a score of 4 and confidence of 2 deserves more interview time than its number suggests.

05 · Honest omissions

What this rubric won't tell you

What the rubric isn't for.

Not graded

Technology choice and architecture.

We do not grade whether SAP, Oracle, or Workday was the right choice. We do not grade integration architecture, data model design, or technical debt. Those are real risks, and they need their own diagnostic. We are not that.

Not graded

Vendor and SI relationships.

We do not grade whether your SI is the right one, whether the contract is structured well, or whether the rate card is reasonable. We grade only how the SI relationship interacts with the human-side risks above.

Not graded

Business case and ROI math.

We grade whether you have agreed-upon KPIs and outcomes. We do not validate the numbers. A program with a great rubric score and a fictional business case is still going to be in trouble at year three.

Not graded

Personality and culture diagnostics.

We do not grade your CEO, your operating-group heads, or your team culture. The rubric describes structural conditions, not personalities. Those structural conditions are tractable. Cultural change is a different and longer engagement.

06 · Calibration

Where the rubric came from

Drawn from programs that missed go-live.

The rubric is calibrated against post-mortems and live transformation engagements, weighted toward programs that missed go-live or went live and then quietly under-delivered. Most rubrics are written from success cases. We have found that success cases tell you less than failures: they share too many uncontrolled variables.

The mix is roughly 60% ERP (SAP, Oracle, Workday), 25% HCM and finance transformations, and 15% operational and supply-chain programs. Geographically, the set skews North America and Western Europe. Programs under $5M are not represented in the calibration set and are not within the rubric's scope.

Three rubric criteria were discarded in prior iterations because they correlated with seniority of the interview subject rather than with program outcomes. Two criteria were added in the current edition (the Recovery pillar's "leading indicators" and "decision velocity" sub-criteria) because they correlated more strongly with on-time go-live than any of the criteria we already had.

07 · The interview style

How the analyst behaves

Push past the prepared answer.

The analyst's job is not to take your word for it. The job is to push until either the program has a real answer or the absence of one becomes the score. Below is the kind of exchange the analyst is calibrated to drive.

Analyst

How is your steering committee structured?

Subject

CFO chairs it. CEO is on it. My COO peers are on it. We meet monthly.

Analyst

When was the last time the steering committee decided something that the working committee had already taken a position on?

Subject

Last month. There was a question about how procurement requisitions would route in the new system.

Analyst

Which way did steering decide?

Subject

Against the working committee's recommendation, actually.

Analyst

How did the working committee chair learn about that?

Subject

I told them after the meeting. They were not happy.

Analyst

And what is the next contentious decision the working committee is queued up to make?

Subject

Probably the role-based access model. It's contentious because finance wants tighter controls than operations does.

Analyst

What does the working committee chair think will happen if they make a call this time?

Subject

Honestly, after last month, they're probably going to bring it to steering preemptively. Which is exactly the dynamic I was hoping to avoid.

ScoringThe subject walked in believing their governance was healthy. The probe surfaced that steering had overridden the working committee on a tactical decision, that the chair had no warning, and that the next contentious decision will preemptively escalate. The sub-criterion was scored 2 (significant risk) with high confidence. The score reflects the structural pattern, not the subject's effort.

The rubric is a tool

The diagnostic is the experience
of being graded against it.

Start the diagnostic

25 min · PDF by Friday

Most transformations fail on the human side.

What each pillar grades, and what it doesn't.

Enterprise Transformation Strategy

Organizational Change Management

Strategic Communications and Executive Messaging

Training and Enablement

Transformation Recovery and Reset Capability

What "3 out of 5" actually means.

A 3 is not passing.

What the rubric isn't for.

Drawn from programs that missed go-live.

Push past the prepared answer.

The diagnostic is the experienceof being graded against it.

The diagnostic is the experience
of being graded against it.