Bibliography & References

Bibliography & External References

Key papers, books, and resources that inform this framework, organized by domain.

AI Safety Foundations

Comprehensive AI Services (CAIS)

Drexler, K.E. (2019). Reframing Superintelligence: Comprehensive AI Services as General Intelligence. Technical Report, Future of Humanity Institute. Link
Foundational proposal for building superintelligence as ecosystem of narrow services

AI Control

Greenblatt, R., et al. (2024). AI Control: Improving Safety Despite Intentional Subversion. Redwood Research. arXiv
Protocols for maintaining safety even if models are scheming

Scalable Oversight

Bowman, S., et al. (2022). Measuring Progress on Scalable Oversight for Large Language Models. arXiv
Methods for humans to effectively oversee increasingly capable AI

Constitutional AI

Bai, Y., et al. (2022). Constitutional AI: Harmlessness from AI Feedback. Anthropic. arXiv
Training models to follow principles through self-critique

Iterated Distillation and Amplification (IDA)

Christiano, P., et al. (2018). Supervising strong learners by amplifying weak experts. arXiv
Recursive amplification of human judgment through AI assistance

Tool AI vs Agent AI

Gwern (2016, updated 2023). Why Tool AIs Want to Be Agent AIs. gwern.net
Economic and practical pressures toward agentic AI systems

Guaranteed Safe AI

Dalrymple, D., et al. (2024). Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems. arXiv
Formal verification approaches to AI safety

Nuclear Safety & PRA

Probabilistic Risk Assessment

NUREG-1150 (1990). Severe Accident Risks: An Assessment for Five US Nuclear Power Plants. US Nuclear Regulatory Commission.
Landmark study establishing PRA methodology

Fault Tree Analysis

Vesely, W.E., et al. (1981). Fault Tree Handbook. NUREG-0492, US Nuclear Regulatory Commission. PDF
Standard reference for fault tree construction and analysis

Defense in Depth

IAEA Safety Standards Series No. SF-1 (2006). Fundamental Safety Principles.
International framework for nuclear safety philosophy

Event Tree Analysis

Rasmussen, N.C., et al. (1975). Reactor Safety Study: An Assessment of Accident Risks in US Commercial Nuclear Power Plants (WASH-1400). US Atomic Energy Commission.
Original “Rasmussen Report” establishing quantitative risk assessment

Financial Risk Management

Risk Budgeting & Allocation

Litterman, R. (1996). Hot Spots and Hedges. Journal of Portfolio Management.
Introduction to risk contribution analysis

Euler Allocation

Tasche, D. (2008). Capital Allocation to Business Units and Sub-Portfolios: the Euler Principle. arXiv
Mathematical foundations of coherent risk allocation

Value at Risk

Jorion, P. (2006). Value at Risk: The New Benchmark for Managing Financial Risk (3rd ed.). McGraw-Hill.
Standard reference for VaR methodology

Expected Shortfall

Acerbi, C., & Tasche, D. (2002). On the coherence of expected shortfall. Journal of Banking & Finance.
Why ES/CVaR is preferred to VaR for tail risk

Coherent Risk Measures

Artzner, P., et al. (1999). Coherent Measures of Risk. Mathematical Finance.
Axiomatic foundations for risk measurement

Security Engineering

Least Privilege

Saltzer, J.H., & Schroeder, M.D. (1975). The Protection of Information in Computer Systems. Proceedings of the IEEE.
Classic paper establishing security design principles

Defense in Depth

Schneier, B. (2000). Secrets and Lies: Digital Security in a Networked World. Wiley.
Accessible introduction to security engineering principles

Byzantine Fault Tolerance

Lamport, L., Shostak, R., & Pease, M. (1982). The Byzantine Generals Problem. ACM Transactions on Programming Languages and Systems.
Foundational paper on consensus despite malicious actors

N-Version Programming

Avizienis, A. (1985). The N-Version Approach to Fault-Tolerant Software. IEEE Transactions on Software Engineering.
Independent implementations for reliability

Mechanism Design

Revelation Principle

Myerson, R.B. (1981). Optimal Auction Design. Mathematics of Operations Research.
When truthful mechanisms are optimal

VCG Mechanism

Vickrey, W. (1961). Counterspeculation, Auctions, and Competitive Sealed Tenders. Journal of Finance.
Clarke, E.H. (1971). Multipart Pricing of Public Goods. Public Choice.
Groves, T. (1973). Incentives in Teams. Econometrica.
Foundations of incentive-compatible mechanism design

Contract Theory

Bolton, P., & Dewatripont, M. (2005). Contract Theory. MIT Press.
Comprehensive treatment of principal-agent problems

Shapley Values

Shapley, L.S. (1953). A Value for n-Person Games. In Contributions to the Theory of Games II.
Lundberg, S.M., & Lee, S.I. (2017). A Unified Approach to Interpreting Model Predictions (SHAP). arXiv
Foundational and modern applications of Shapley values

Formal Methods & Type Systems

Linear Logic

Girard, J.-Y. (1987). Linear Logic. Theoretical Computer Science, 50(1). DOI
Resource-sensitive logic foundational to consumable delegation risk budgets

Markov Categories

Fritz, T. (2020). A Synthetic Approach to Markov Kernels, Conditional Independence and Theorems on Sufficient Statistics. Advances in Mathematics, 370. arXiv
Compositional probability theory for risk propagation

Capability-Based Security

Dennis, J.B., & Van Horn, E.C. (1966). Programming Semantics for Multiprogrammed Computations. Communications of the ACM.
Miller, M.S. (2006). Robust Composition: Towards a Unified Approach to Access Control and Concurrency Control. PhD thesis, Johns Hopkins University.
Foundational work on capability systems

Software Engineering

Microservices Architecture

Newman, S. (2015). Building Microservices. O’Reilly.
Decomposition patterns for software systems

Site Reliability Engineering

Beyer, B., et al. (2016). Site Reliability Engineering: How Google Runs Production Systems. O’Reilly. Free online
Error budgets, monitoring, incident response

Circuit Breakers

Nygard, M. (2018). Release It! Design and Deploy Production-Ready Software (2nd ed.). Pragmatic Bookshelf.
Patterns for resilient systems

Automotive Safety

ISO 26262

ISO 26262:2018. Road vehicles — Functional safety.
International standard for automotive safety integrity levels

ASIL Decomposition

Palin, R., & Goff, L. (2010). ASIL Decomposition: The Good, the Bad and the Ugly. SAE Technical Paper.
Practical guidance on safety requirement allocation

Additional Resources

Online Courses

MIT OpenCourseWare: Nuclear Systems Design Project (22.033)
Stanford: CS224N Natural Language Processing
Coursera: Financial Engineering and Risk Management (Columbia)

Blogs & Forums

AI Alignment Forum: alignmentforum.org
LessWrong: lesswrong.com
Anthropic Research: anthropic.com/research

Tools & Libraries

LangChain: Agent orchestration framework
Guardrails AI: LLM output validation
OpenTelemetry: Observability framework

How to Cite This Framework

If referencing this documentation:

AI Safety Framework: Structural Constraints for Safe AI Systems.
https://[domain]/

Bibliography & References

Bibliography & External References

AI Safety Foundations

Comprehensive AI Services (CAIS)

AI Control

Scalable Oversight

Constitutional AI

Iterated Distillation and Amplification (IDA)

Tool AI vs Agent AI

Guaranteed Safe AI

Nuclear Safety & PRA

Probabilistic Risk Assessment

Fault Tree Analysis

Defense in Depth

Event Tree Analysis

Financial Risk Management

Risk Budgeting & Allocation

Euler Allocation

Value at Risk

Expected Shortfall

Coherent Risk Measures

Security Engineering

Least Privilege

Defense in Depth

Byzantine Fault Tolerance

N-Version Programming

Mechanism Design

Revelation Principle

VCG Mechanism

Contract Theory

Shapley Values

Formal Methods & Type Systems

Linear Logic

Markov Categories

Capability-Based Security

Software Engineering

Microservices Architecture

Site Reliability Engineering

Circuit Breakers

Automotive Safety

ISO 26262

ASIL Decomposition

Additional Resources

Online Courses

Blogs & Forums

Tools & Libraries

How to Cite This Framework

See Also