Skip to content

Open Source Supply Chain: Trust at Internet Scale

Open Source Supply Chain: Trust at Internet Scale

Section titled “Open Source Supply Chain: Trust at Internet Scale”

Every day, millions of developers run npm install or pip install, downloading code written by strangers and executing it on their machines, their servers, their production systems. They do this without reading the code, without verifying the authors, without understanding the dependencies.

We have accidentally built the largest trust network in human history—and it’s terrifyingly fragile.


flowchart TB
    DEV[Developer] -->|"npm install express"| NPM[npm Registry]
    NPM -->|"downloads"| EXP[express<br/>+ 30 dependencies]
    EXP --> DEP1[accepts<br/>+ its dependencies]
    EXP --> DEP2[array-flatten]
    EXP --> DEP3[body-parser<br/>+ 8 dependencies]
    DEP1 --> DEP1A[mime-types<br/>+ 2 dependencies]
    DEP3 --> DEP3A[debug]
    DEP3 --> DEP3B[...]

    subgraph "What You Trusted"
        TRUSTED["You trust:
        • express maintainers
        • All 30 dependency maintainers
        • npm registry not compromised
        • Your network not MitM'd
        • All dependency maintainers not hacked"]
    end

A simple web framework (Express) pulls in 30+ dependencies. Each is a trust relationship.

PackageDirect DependenciesTransitive DependenciesUnique Maintainers
express3057~40
react38~10
lodash00~5
create-react-app50+1,500+~500
AWS SDK100+400+~200

Trust equation:

P(supply chain safe) = P(each dependency safe)^(number of dependencies)

If each dependency has 99.9% probability of being safe:

  • 10 dependencies: 0.999^10 = 99.0% safe
  • 100 dependencies: 0.999^100 = 90.5% safe
  • 1,000 dependencies: 0.999^1000 = 36.8% safe

flowchart TB
    YOU[Your Code] -->|"depends on"| PKG[Package]
    PKG -->|"maintained by"| MAINT[Maintainer<br/>often 1 person]
    MAINT -->|"has access via"| NPM_ACC[npm Account]
    NPM_ACC -->|"secured by"| PASS[Password + 2FA?]
    PKG -->|"built by"| CI[CI/CD Pipeline]
    CI -->|"runs on"| GH[GitHub Actions/<br/>other infra]
    PKG -->|"stored at"| REG[Registry<br/>npm/PyPI/etc]

    subgraph "Attack Surfaces"
        MAINT_COMP[Maintainer computer compromised]
        MAINT_SOCIAL[Maintainer socially engineered]
        ACC_STOLEN[Account credentials stolen]
        CI_POISON[CI pipeline poisoned]
        REG_BREACH[Registry breached]
        TYPO[Typosquatting package published]
    end
EntityWhat You Trust Them ForP(compromise)Impact if Compromised
Package maintainerNot malicious, competent0.001/yearAll users of package
Maintainer’s accountNot hacked0.01/yearAll users of package
Registry (npm/PyPI)Not breached, honest0.001/yearAll users of registry
CI/CD systemNot poisoned0.005/yearAll packages using that CI
Your networkNot MitM’d0.0001/installYour system

Many critical packages are maintained by one or two people:

PackageWeekly DownloadsMaintainersBus Factor
core-js30M+1 (Denis Pushkarev)1
colors20M+1 (Marak Squires)1
node-ipc1M+11
left-pad2.5M+ (at peak)11

These single individuals have implicit commit access to millions of production systems.


Part 3: Case Study — The xz Backdoor (2024)

Section titled “Part 3: Case Study — The xz Backdoor (2024)”

In early 2024, a backdoor was discovered in xz-utils, a compression library used by virtually every Linux system.

flowchart TB
    subgraph "The Long Con"
        Y1[2021: 'Jia Tan' appears<br/>helpful contributor]
        Y2[2022: Gains trust<br/>makes good contributions]
        Y3[2023: Becomes co-maintainer<br/>maintainer pressured to share access]
        Y4[2024: Inserts backdoor<br/>via obfuscated test files]
    end

    Y1 --> Y2 --> Y3 --> Y4

    subgraph "The Backdoor"
        BD[Hidden code in test files]
        BD -->|"activated during build"| SSH[Compromises SSHD]
        SSH -->|"allows"| RCE[Remote Code Execution]
    end

    Y4 --> BD
  1. Social engineering the maintainer: The original maintainer was burned out. “Jia Tan” offered to help. Community members (possibly sock puppets) pressured maintainer to accept help.

  2. Years of trust-building: Jia Tan made legitimate contributions for 2+ years, building reputation.

  3. Gaining commit access: Eventually became co-maintainer with ability to merge code.

  4. Obfuscated payload: Backdoor hidden in binary test files, not visible in source review.

  5. Targeted distribution: Specifically designed to compromise OpenSSH on systemd-based Linux.

FactorHow It Helped Attack
Maintainer burnoutCreated opportunity for “helpful” attacker
Trust accumulationYears of good behavior built reputation
ComplexityBackdoor hidden in build process, not source
Dependency chainsxz is dependency of SSH, which is everywhere
Binary test filesNot human-readable, not reviewed

Luck. A Microsoft developer (Andres Freund) noticed SSH was using slightly more CPU than expected. He investigated for performance reasons and found the backdoor.

This was not caught by:

  • Code review
  • Automated scanning
  • Security audits
  • Reproducible builds (not widely deployed)

Delegation Risk Calculation: If xz Hadn’t Been Caught

Section titled “Delegation Risk Calculation: If xz Hadn’t Been Caught”

Scope of compromise:

  • Every Linux system with affected xz version
  • Every SSH server using affected systemd
  • Estimated: 50-100 million systems

Potential damage:

Delegation Risk = P(exploitation) × Number of systems × Damage per system
ScenarioP(exploitation)SystemsDamage/SystemTotal Delegation Risk
Targeted espionage0.001100M$1,000$100B
Widespread ransomware0.01100M$10,000$10T
Critical infrastructure0.00011M critical$10M$1T

The xz attack could have been the most damaging cyberattack in history if not for one person’s curiosity about CPU usage.


AttackYearVectorImpact
left-pad removal2016Maintainer unpublished packageBroke thousands of builds
event-stream2018Trusted maintainer added, inserted malwareBitcoin wallet theft
ua-parser-js2021Account compromiseCryptominer installed
colors & faker2022Maintainer sabotage (protest)Broke many applications
node-ipc2022Maintainer protest (Ukraine)Deleted files on Russian IPs
PyPI typosquattingOngoingMalicious packages with similar namesVarious
xz backdoor2024Long-term social engineering(Caught before impact)
flowchart TB
    ATTACK[Supply Chain Attack]

    ATTACK --> ACC[Account Compromise]
    ATTACK --> MAINT[Maintainer-Based]
    ATTACK --> REG[Registry-Based]
    ATTACK --> BUILD[Build-Based]

    ACC --> ACC1[Credential theft]
    ACC --> ACC2[Session hijacking]

    MAINT --> MAINT1[Maintainer goes rogue]
    MAINT --> MAINT2[Maintainer compromised]
    MAINT --> MAINT3[Maintainer transfer to attacker]

    REG --> REG1[Typosquatting]
    REG --> REG2[Dependency confusion]
    REG --> REG3[Registry breach]

    BUILD --> BUILD1[CI poisoning]
    BUILD --> BUILD2[Build env compromise]
    BUILD --> BUILD3[Compiler backdoor]

Many attacks exploit trust decay—the original trusted maintainer is gone, but the trust persists:

Year 0: Alice creates popular package, everyone trusts it
Year 3: Alice burns out, stops maintaining
Year 4: Bob offers to help, Alice transfers ownership
Year 5: Bob is either compromised or was always malicious
Year 5+: Everyone still trusts "the package" but Bob controls it

The trust was in Alice, but it transferred to Bob implicitly.


Part 5: The Trust Architecture of Open Source

Section titled “Part 5: The Trust Architecture of Open Source”
flowchart TB
    subgraph "What Users Assume"
        A1[Package is safe]
        A2[Maintainer is trustworthy]
        A3[Code matches source]
        A4[No hidden dependencies]
    end

    subgraph "What's Actually Verified"
        V1[Almost nothing]
    end

    A1 -.->|"not verified"| V1
    A2 -.->|"not verified"| V1
    A3 -.->|"rarely"| V1
    A4 -.->|"sometimes"| V1
MechanismWhat It VerifiesAdoptionEffectiveness
Package signingPublisher identityLowModerate
2FA on accountsAccount not trivially stolenGrowingModerate
Reproducible buildsBinary matches sourceVery lowHigh (when used)
Security auditsCode is safeVery low (expensive)High (when done)
Dependency scanningKnown vulnerabilitiesGrowingModerate
SBOMKnow what you haveLowHelps remediation
Lock filesPin versionsHighPrevents some drift

Effort to publish a package:

1. Create npm account (5 minutes)
2. Write package (varies)
3. npm publish (1 minute)
4. Done

Effort to verify a package:

1. Read all code (hours to days)
2. Verify maintainer identity (hard)
3. Audit dependencies recursively (impossible at scale)
4. Monitor for future changes (ongoing)

The asymmetry is 1000:1 or worse.


Part 6: Delegation Risk Analysis of the Open Source Ecosystem

Section titled “Part 6: Delegation Risk Analysis of the Open Source Ecosystem”

For a typical npm package:

RiskP(occurrence/year)ImpactDelegation Risk
Maintainer account compromised0.01All usersDepends on users
Malicious dependency added0.005All usersDepends on users
Vulnerability introduced0.05All usersModerate
Package unpublished0.001Build breakageLow

Delegation Risk scales with popularity:

Package Delegation Risk = Base_risk × Number_of_dependents × Impact_per_dependent
PackageDependentsBase RiskImpactAnnual Delegation Risk
lodash150,000+ packages0.001$100K avg$15M
express80,000+ packages0.001$200K avg$16M
core-js50,000+ packages0.002 (single maintainer)$100K avg$10M

For the entire npm ecosystem:

Ecosystem Delegation Risk = Σ(Package Delegation Risk) + P(registry compromise) × Total impact
ComponentAnnual Delegation Risk
Sum of top 1000 packages~$500M
Long tail (2M other packages)~$200M
Registry-level risk~$10B (low P, catastrophic impact)
Total npm Ecosystem Delegation Risk~$10B/year

Option A: Centralized Verification (App Store Model)

Section titled “Option A: Centralized Verification (App Store Model)”
flowchart TB
    DEV[Developer] -->|"submits"| REVIEW[Central Review Team]
    REVIEW -->|"approves"| REG[Verified Registry]
    USER[User] -->|"installs verified only"| REG

Like Apple’s App Store for packages.

AspectProCon
QualityHigher average qualityBottleneck, slow
SecurityProfessional reviewStill imperfect (malware in App Store exists)
Innovation-Gatekeeping stifles innovation
Cost-Who pays for reviewers?

Delegation Risk impact:

Verification catch rate: 80%
New ecosystem Delegation Risk = 0.20 × Original Delegation Risk = $2B/year
Cost of verification: $500M+/year
Net: Maybe positive ROI, but changes ecosystem nature
flowchart TB
    subgraph "Trust Web"
        A[Maintainer A<br/>verified] -->|"vouches for"| B[Maintainer B]
        B -->|"vouches for"| C[Maintainer C]
        CORP[Known Corporation] -->|"vouches for"| D[Maintainer D]
    end

    USER[User] -->|"trusts"| A
    USER -->|"transitively trusts"| B
    USER -->|"transitively trusts"| C

Trust flows through vouching relationships.

AspectProCon
DecentralizedNo single bottleneckHard to bootstrap
ScalesVerification distributedTrust decay problems
Community-drivenLeverages existing relationshipsCliques, politics

Delegation Risk impact:

Effective trust verification: 50%
New ecosystem Delegation Risk = 0.50 × Original Delegation Risk = $5B/year
Cost: Low (community-based)
Net: Moderate improvement, culturally difficult

Option C: Economic Incentives (Bug Bounties + Insurance)

Section titled “Option C: Economic Incentives (Bug Bounties + Insurance)”
flowchart TB
    MAINT[Maintainer] -->|"posts bond"| BOND[Security Bond<br/>$10K-$1M]
    PKG[Package] -->|"if vulnerability found"| BOUNTY[Bounty Payout<br/>from bond]
    USER[User] -->|"pays premium"| INS[Supply Chain Insurance]
    INS -->|"if incident"| USER

Market-based trust verification.

AspectProCon
Incentive-alignedMaintainers have skin in gameExcludes hobbyists
ScalableMarket sets pricesMay not cover long-term attacks
InsuranceRisk transferred to those who can assess itExpensive

Delegation Risk impact:

With $100K average bond per popular package (top 10K):
- Direct Delegation Risk reduction from bonds: $1B/year
- Indirect effect (better practices): 2-3×
New ecosystem Delegation Risk = $3B/year
Cost: Passed to ecosystem (higher package costs)

Option D: Reproducible Builds + Transparency Logs

Section titled “Option D: Reproducible Builds + Transparency Logs”
flowchart TB
    SRC[Source Code<br/>GitHub] -->|"deterministic build"| BUILD[Build Process]
    BUILD -->|"produces"| BIN[Binary/Package]
    BIN -->|"hash logged"| LOG[Transparency Log<br/>Immutable, append-only]

    VERIFY[Verifier<br/>Anyone can rebuild] -->|"rebuilds from source"| REBUILD[Rebuilt Package]
    REBUILD -->|"compare hash"| BIN

    USER[User] -->|"checks"| LOG

Technical verification that binaries match source.

AspectProCon
Cryptographic guaranteeCan verify build integrityDoesn’t verify source is safe
Tamper-evidentLog shows all changesComplex infrastructure
Community verificationAnyone can verifyFew actually do

Delegation Risk impact:

Catches: Build tampering, CI poisoning, some account compromises
Doesn't catch: Malicious source commits, social engineering
Delegation Risk reduction: ~30% (significant portion of attacks)
New ecosystem Delegation Risk = $7B/year
Cost: Infrastructure + ecosystem adoption

Optimal architecture combines elements:

flowchart TB
    subgraph "Layer 1: Publication"
        SIGN[Package Signing<br/>Identity verification]
        TWO_FA[2FA Required]
    end

    subgraph "Layer 2: Build"
        REPRO[Reproducible Builds]
        LOG[Transparency Log]
    end

    subgraph "Layer 3: Economics"
        BOND[Maintainer Bonds<br/>for critical packages]
        BOUNTY[Bug Bounties]
    end

    subgraph "Layer 4: Community"
        AUDIT[Community Audits]
        TRUST[Trust Endorsements]
    end

    SIGN --> REPRO
    REPRO --> BOND
    BOND --> AUDIT

Combined Delegation Risk:

Layer 1 (identity): -20% Delegation Risk
Layer 2 (builds): -30% Delegation Risk (of remaining)
Layer 3 (economics): -40% Delegation Risk (of remaining)
Layer 4 (community): -20% Delegation Risk (of remaining)
Combined: 0.80 × 0.70 × 0.60 × 0.80 = 0.27
New ecosystem Delegation Risk = $2.7B/year (73% reduction)

Behind every package is a human (or small team):

Maintainer demographics:

  • Most are volunteers
  • Many maintain critical packages in spare time
  • Burnout is epidemic
  • Compensation is rare

The xz maintainer’s situation:

  • Solo maintainer of critical infrastructure
  • Burnout and mental health struggles (publicly documented)
  • Community pressure to accept help
  • Transferred trust to someone who turned out to be an attacker
PartyBenefit from PackageCost of MaintenanceCost of Security Incident
MaintainerReputation, maybe donationsTime, stress, liabilityReputation, harassment
UsersFree functionalityNothingLost data, breach
CompaniesFree infrastructureNothing (usually)Major (but passed to customers)

The maintainer bears costs but captures almost none of the benefit.

1. Corporate sponsorship of critical packages

Company using critical package → pays maintainer salary

Examples:

  • Tidelift (marketplace for maintainer support)
  • GitHub Sponsors
  • Corporate open source offices

2. Foundation stewardship

Critical packages → maintained by foundation with funding

Examples:

  • Linux Foundation
  • Apache Foundation
  • Python Software Foundation

3. Paid security audits for critical packages

Fund → Security firm → Audits critical packages → Reports

Examples:

  • OSTIF (Open Source Technology Improvement Fund)
  • Google’s OSS-Fuzz
  • Various corporate-funded audits

If top 1000 critical packages had:

  • Full-time paid maintainer
  • Regular security audits
  • 2+ person maintenance team (no single point of failure)
Risk reduction per package: ~80%
Coverage: Top 1000 packages = 90% of ecosystem impact
Cost: ~$100M/year (1000 × $100K average)
Delegation Risk reduction: 0.80 × 0.90 × $10B = $7.2B/year
ROI: 72× ($7.2B saved for $100M spent)

This is perhaps the highest-ROI security investment possible.


AI systems have similar dependency structures:

  • Models depend on training data from many sources
  • Pipelines depend on libraries with their own dependencies
  • Deployment depends on infrastructure code

AI supply chain risks:

  • Poisoned training data
  • Backdoored model weights
  • Compromised ML libraries
  • Malicious model cards
Open SourceAI Systems
Package maintainerModel creator
npm registryModel hub (HuggingFace)
DependenciesFine-tuning, embeddings, tools
Transitive trust explosionModel-to-model dependencies

1. Minimize dependencies

  • Fewer dependencies = fewer trust relationships
  • “Least Dependency” principle

2. Verify what you can

  • Reproducible builds → Reproducible training
  • Hash verification → Model hash verification
  • Build logs → Training logs

3. Monitor for drift

  • Version pinning → Model version pinning
  • Dependency scanning → Model behavior monitoring

4. Fund critical infrastructure

  • Maintainer support → AI safety research support
  • The people building the foundations need resources