What Can Red Teams Learn From Aviation?
Checklists, Crew Resource Management, and Red Teaming
Introduction – The Why
Commercial aviation transformed itself from a high-risk industry into one of the safest in the world. That success came not from technology alone but from a deliberate cultural shift. Three innovations stand out: the disciplined use of checklists, a culture of relentless pursuit of excellence and continuous improvement in a no-blame culture, and the adoption of Crew Resource Management (CRM), which reshaped how teams communicate and make decisions under pressure. Data collection and human factors increasingly play a large role in optimising high performing and high impact teams – this can be seen in the top military units as well as top sports teams.
It’s also a good metaphor for how a lot of industry wide problems in cyber have often been solved by other more mature (in terms of length of time the industry has existed) sectors. Not every problem in cybersecurity is unique, and we should guard against ‘Not Invented Here’ syndrome.
This post argues that careful implementation of the same practices and mental models can significantly improve Red Team operations. When applied thoughtfully, checklists and CRM can increase OpSec, reduce the risk of unintended impact in client environments, ease reporting and deliver more authentic training scenarios for Security Operations Centres (SOCs). When applied poorly, they can burden operators, decrease agility or create “checklist fatigue” and box-ticking. Like many things in life, the answer lies somewhere in the middle rather than on the extremes.
This isn’t about introducing meaningless process into Red Teaming, nor is it the over-formalisation of our job, and it certainly isn’t the ghastly trend of introducing military language into civilian disciplines (notwithstanding the origins of Red Teaming).
Shared Mental Models
Aviation accident reports repeatedly show how crews fail when they lose a shared understanding of what is happening or how the situation is changing. Hierarchical structures can also stop junior staff from speaking up or being heard. The below two incidents (these are simplified summaries of very complex issues, you can find links to the investigations in References) will be used as case studies throughout this post.
- United Airlines Flight 173 (1978): The Captain fixated on a landing gear issue, while the first officer and flight engineer were aware of dwindling fuel but did not speak up forcefully enough. The aircraft eventually ran out of fuel whilst the Captain was still focused on the landing gear, killing ten people.
- British Airways November Oscar incident (1989): The pilot Glen Stewart made an extremely close landing at Heathrow in breach of safety laws, but believed it was the best option when faced with all negative alternative courses of actoin. This incident is often cited as a perfect example where the ‘you need to make it work‘ pressure of the organisation combined with unfortunate circumstances encouraged poor decision making. Stewart was convicted in a criminal court for his actions (the first in British history) and committed suicide shortly afterwards.
Relevance to Red Teaming
Many operators will have been in situations where a colleague decided to use a particular TTP or tool that they may have disagreed with but didn’t speak up, leading to an outage, detection or similar. Equally, we have all felt the pressure to succeed and hit all of the objectives in the time allocated – rushing commands and making OpSec fails due to time. When clients are paying large sums of money for the work, the answer of ‘we couldn’t make any progress so tried $TTP which got us burned‘ is particularly unpleasant.
Red Teams are also prone to tunnel vision. A Primary operator may focus narrowly on getting an evasion technique to work or debug why a payload is failing, while nobody steps back to ask “do we even need to implant this host? We can reach it remotely“.
Simplicity helps. Consider holding a micro-brief to ensure shared understanding of the plan before high-risk action such as a persistence install or lateral movement attempt. It should last no more than a minute:
- What is the intent?
- What telemetry do we expect?
- What would a world class SOC see,
- What can we expect that the client’s SOC will see (the two are frequently different!).
- What are the things that can go wrong, and have we mitigated them as much as is practicable?
- How do we reverse if needed? What happens if the payload is locked by the OS?
Empower the Secondary operator to pause the TTP whenever observed behaviour diverges from the plan. This habit builds a shared mental model of what is expected into the workflow and prevents single-point focus or rabbit-hole fixation.
The trade-off is that such briefs add a small pause. But as aviation discovered through incident reporting and extensive human factors research, that pause usually saves far more time than it costs by preventing errors that require hours of remediation. In Red Teaming, this will avoid awkward client conversations about outages or detections, whereas in aviation it is usually about avoiding aircraft and human loss.
Checklists: Read-Do and Do-Confirm
NASA Research
NASA’s human factors work distinguishes between two broad types of checklist:
- Read-Do. Each step is read aloud and then carried out. These are used for rare or high-risk procedures where memory and instinct cannot be trusted.
- Do-Confirm. Operators perform the routine sequence from habit, then pause to confirm that critical steps have not been missed.
Both approaches fail if designed poorly. Overly long lists, vague and waffley items, or badly timed pauses lead to “checklist complacency”. This is also not just limited to aviation; studies in healthcare showed that surgeons who used the WHO Surgical Safety Checklist had dramatic reductions in mortality when short do-confirm lists were used at natural pause points, but noticeably poorer results when they became box-ticking exercises.
Application to Red Teaming
The same logic applies. This can be as simple as the Primary drafting C2 commands in VSCode (not in the implant command entry box…), being checked through the Secondary via screenshare before being pasted into the C2. Virtually every Operator will have had failures where they have pasted half complete commands, commands with random line endings, or commands from their lab directly into a client environment!
Use read-do for actions that are rare or difficult to reverse if they go wrong: persistence mechanisms, privilege escalation relying on vulnerability exploitation, high-impact database queries, deployment of implants to network appliances / non-standard platforms, or changes to client / cloud infrastructure (we all love a CI/CD pipeline compromise after all).
Use do-confirm for routine but still risky phases: payload delivery and opsec steps, lateral movement, or cleanup actions (registry, filesystem, infrastructure, IAM backdoors).
For example, before persistence installation a short read-do list might include:
- Verifying the implant SHA256 hash is noted in $reporting_platform,
- Checking mutex uniqueness,
- Ensuring .Net works correctly in the current implant if needed (although… BOFs),
- Confirming a backup implant is beaconing,
- Validating removal steps.
Such a list takes under a minute yet dramatically reduces the risk of having your TTP fail or needing to find missing information in ten weeks time during your reporting time…
The key is restraint and not trying to cover every use possible case. At some point the experience of your team members will have to be trusted. Red Teaming isn’t hacking by numbers or by checklist. Limit your lists to 5–9 items and place them at genuine pause points. Anything longer, and operators will start to bypass them or not apply conscious scrutiny – the very thing we are trying to avoid.
The tracking of these checklists and how the team interact with them is key to their adoption. What gets measured, gets done. If there is no confirmation of whether the checklist was used or ignored, it will be ignored. Equally, if it is mandraulic or takes excessive time to use, it will be ignored. Consider building simple Markdown tick blocks that can be inserted quickly into your op notes via a button or keyboard shortcut, or simple Slack workflows. MDSec have done some good work in this area with Nighthawk integrations via API into Outline / CodiMD for reporting – you can pre-populate implant metadata or IOCs from the C2.
These steps should also be combined with sensible C2 tooling choice – granular recording of times that commands were submitted to the server, picked up by the implant, automatic upload/download hashing and tracking, and a cancel command. It is important that implants and tools are engineered to act as a backstop to prevent operator error – people make typos and errors – being able to catch them before they balloon is essential. This is the implant equivalent of a safety cover over the lever marked ‘turn off engines’.
Threat and Error Management
Definition
Aviation frames operations through Threat and Error Management (TEM): The big steps are anticipate threats, trap errors, and recover from undesired states.
Application to Red Teaming
Threats to successful red teaming include client detection tools, excessive risk aversion from the control group, fragile production workloads, or using the wrong payload on the wrong host.
Errors could be running a noisy scan through typos, missing double brackets in your LDAPS/ADWS query, splitting commands into newlines, leaving an unbounded SQL query, or deploying an untested or unkeyed implant into non-tested processes. Every operator will have a story of exiting an old Cobalt Strike implant that was in winlogon.exe, which used to exit the process rather than the thread…
Undesired states include SOC alerts that alert analysts or production systems being degraded, or the client losing faith in your abilities, risk management or tooling.
The TEM model acknowledges that threats and errors are normal. What matters is trapping them early and recovering quickly. That is precisely what the Secondary operator role achieves. They are a skilled sanity-check and act as a backup and support to the Primary, rather than second guessing, undermining and being cynical about any decision making – ‘I wouldn’t have done it like that‘.
Crew Resource Management (CRM) Behaviours
CRM reshaped aviation by training crews to communicate clearly, challenge decisions, and share workload. Both CAA and FAA guidance stresses that SOPs and roles exist to build a shared mental model. Airlines historically had extremely hierarchical structures within First Officers being unwilling to challenge the Pilot in Command or Captain. This was also significantly enhanced in Asian cultures and is directly cited in several incident reports as being a contributory factor in the incident.
Primary and Secondary Operators
- Primary Operator: Executes on keyboard.
- Secondary Operator: Monitors context and risk, sanity checks commands and runs record keeping / screenshots, with explicit authority to pause actions.
An simple example of this in use would be:
- Primary: “Prepping COM hijack”
- Secondary: “Checking the expected hash and keying of the payload on the C2 server filesystem…. good to go”
- The command is then sent, and the Secondary records timestamps and file hashes in whatever platform the team uses.
This style is neutral, not militaristic. Its value is that it normalises healthy friction exactly when it is needed. It also aims to catch the most likely issues that crop up – using the wrong payload with the wrong URLs or keying, as virtually every action is pre-tested in a lab environment in modern Red Teaming.
Checklists in Practice
Checklist for Persistence
A sample persistence checklist might include:
- EDR Lab testing complete.
- Pivot / Egress shellcode decided on.
- New C2 URIs and hostnames propagated.
- Domain keying enabled.
- External encryption key retrieval active.
- Mutex confirmed.
- OpSec fails / ‘bad strings’ stripped.
- Backup implant at a sensible sleep to allow it be used to correct if needed.
- Payload hash recorded.
- Timestamp logged.
Capaoutput captured for report.- Backup implant sleep re-increased.
These short items catch the most frequent errors:

Lateral Movement
Aviation Reference
The incidents in United 173 and Eastern 401 show what happens when crews become fixated on detail and lose oversight, or consider that there may be other factors outside of the ones they are directly considering. Lateral movement within Red Teaming can present the same problem: operators intent on progressing may overlook other attack paths or any applicable detection risks (eg clients having a different EDR config on workstations vs servers, or SOCs being empowered to auto-isolate certain categories of hosts and being able to cause business disruption).
Checklist for Lateral Movement
- Target host within scope.
- TTP chosen and payload tested.
- Keyed payload placed on C2 server.
- Backup implant working and implant sleep settings adjusted to allow for any back-out needed.
- Confirm expected telemetry from lab environment – inserted into report for debriefing.
- Control Group notified if scenario or RoE requires.
- Secondary monitors Redirector logs and C2 commands throughout.
These steps reinforce TEM’s core idea: undesired states such as payload sandboxing / version control errors are inevitable, but they can be contained if teams maintain situational awareness and work as a team – two brains on one task rather than two brains on two tasks.
Configuration Discipline
OpSec and checklists are not exclusively about payloads or TTPs. Reporting is basically the only deliverable that the client receives (and often cares about).
Aviation is a famously paperwork and certification heavy profession with staff included in Air Operating Companies (AOCs) specifically to manage the never-ending-paperwork. Within commercial Red Teams this luxury of a non-billable staff member is rarely possible, so judicious use of automation can allow for good record keeping, automated OpSec protections and reduced operator load and boring admin.
- One button to rotate redirectors, CDNs and DNS entries where needed.
- Automated reminders and tearing down of infrastructure promptly after use to prevent unwanted categorisation and excessive cloud costs.
- HTTPS certificate hygiene and threat intel feed management – ensuring the use of wildcard certificates or ensuring that each C2 profile have different JA3 hashes.
- Storing configurations in version control with manifests – a simple stage at the bottom of your RunDeck / Ansible Semaphore / Red Commander playbook can push an infrastructure summary to Git, or post it to Slack. This can also be posted to Outline or whichever note taking platform you use.
- Logging infrastructure changes consistently – the use of Terraform encrypted remote state is the most common solution.
Integration into CI/CD
Treat payloads as code and automate your record keeping.
- Keep profiles, masks, and loaders in Git and make use of pipelines whereever possible.
- Build in CI, producing signed artefacts where needed. Keep git commit hashes of artefacts and all post-ex tooling. This should be tracked by the C2 as well, but certain (commercial) C2s do not currently have this functionality.
- Run automated lab tests against local detection rules – yara, redare or even just ‘
strings‘ – save it as a zip file somewhere. - Output a manifest (hashes, URIs, C2 settings) into Slack or Teams so all operators share the same picture and others can pick up the engagement if needed. This can be lifted into the report or Outline / CodiMD as an IOC block, saving time.
Designing and Measuring Checklists
If you have got this far and decided you would like to at least consider implementing some of the ideas discussed, here is some guidance to get started.
Design Rules
- Limit to 5–9 items.
- Place at natural pause points.
- Phrase items in observable terms – things are either a ‘yes/no’ or ‘done/not done’.
- Primary reads checklist, does the action, Secondary confirms.
Metrics
These steps will help you measure whether your checklists are having an effect.
- Median checklist runtime (aim for under 30 seconds).
- Number of risky actions prevented.
- Override rate (the times operators did the action despite negative indicators) with reasons.
- Operator sentiment surveys (do they think it is making any difference).
- Reporting time saved by structured logs.
These metrics should guide pruning. If lists take too long or prevent nothing, they are the wrong lists.
Implementation Options
Slack
- Workflow Builder can present forms with fields like technique, host, action, and risk.
- Submissions post to a channel in Block Kit format.
- A script exports daily entries to Markdown tables for reporting.
Microsoft Teams
- Adaptive Cards collect the same data as Slack.
- Power Automate or inbound Workflows stores responses in SharePoint.
- Exports feed reports.
Lightweight / Vibe Coded Flask app
- Serve checklists as JSON that are displayed in the UI. This keeps checklists as code.
- Accept status submissions or updates via webhook – if your C2 has an API this can auto complete checklists for you.
- Store entries in SQLite or Git so any other team tooling can easily parse them.
- Export daily Markdown files; these can be pushed to CodiMD or applicable note-taking application via API.
The aim is to maintain agility and avoid excessive process while maintaining records and measuring and reducing errors.
A Culture of Learning
Blameless Reviews
Google wrote the Bible for SRE staff several years ago. After each engagement or error, run a blameless review that describes what happened, why it made sense at the time, and what systemic improvements follow. This concept is very similar to aviation’s “just culture” and prevents errors from being buried. It does not shy away from blame where dereliction or incompetence exists, but seeks to understand the root causes as much as possible.
Aviation staff are rewarded for raising safety issues; we can contrast this to the well known anecdotes from medical professions where mistakes are often suppressed or staff ‘close ranks’ – the difference in learning and continual improvement is stark.
Black Box Thinking
Matthew Syed argues that aviation’s openness to learning explains its safety record. Red Teams who are committed to continuous improvement have no choice but to treat errors as learning opportunities and near misses as a new source of potential data. Without data, there can be no systemic learning.
Implementation Guide
Implementing too much, too fast will mean that the project doesn’t stick. It will be a flash in the pan. We can all name red team ideas that start with the best of intentions then just fall away.
First month: Draft three checklists (persistence, lateral movement, cleanup). Trial with Slack Workflow Builder.
Second month: Add CI/CD OpSec builds with different C2 profiles / opsec settings. Capture manifests in Slack.
Third month: Run blameless reviews or AARs. Prune checklists. Expand to initial access and privilege escalation.
Conclusion
Aviation did not become safe by chance. It became safe through deliberate cultural change: checklists to catch the small mistakes, CRM to reshape teamwork, and black-box learning to turn failure into progress. Red Teams operate in fragile client environments where the same dynamics apply.
By adopting these practices – short lists at pause points, Primary and Secondary operators with clear roles, opsec built into CI/CD, and blameless learning – Red Teams can increase their credibility, reduce risk, and deliver greater value to clients.
The lesson is not to impersonate pilots or increase ‘militarisation’, but to adopt the disciplines that made aviation trustworthy. These disciplines make Red Teams sharper, faster, and safer.
Cyber is not the special snowflake industry that cannot learn from other safety critical, global industries. We should be intellectually curious enough to take the best parts of other sectors and adapt / incorporate where we can.
References
The below references were useful in compiling the case studies / examples for this post.
- FAA Lessons Learned: Eastern Air Lines Flight 401, 1972.
- NTSB AAR 79-07: United Airlines Flight 173, Portland, Oregon, 1978.
- ATSB: Qantas Flight 32, Airbus A380 uncontained engine failure, 2010.
- NASA CR-177549 (Degani & Wiener): Human Factors of Flight-Deck Checklists.
- Haynes AB et al. A surgical safety checklist to reduce morbidity and mortality in a global population. NEJM, 2009.
- Skybrary: Threat and Error Management.
- FAA Advisory Circular 120-51E: Crew Resource Management Training.
- Outflank: AceLdr UDRL Project.
- Syed, M. Black Box Thinking, 2015.
- Google SRE: Postmortem Culture.
- Not Invented Here: https://en.wikipedia.org/wiki/Not_invented_here