Bibliography & References
Bibliography & External References
Section titled “Bibliography & External References”Key papers, books, and resources that inform this framework, organized by domain.
AI Safety Foundations
Section titled “AI Safety Foundations”Comprehensive AI Services (CAIS)
Section titled “Comprehensive AI Services (CAIS)”- Drexler, K.E. (2019). Reframing Superintelligence: Comprehensive AI Services as General Intelligence. Technical Report, Future of Humanity Institute. Link
- Foundational proposal for building superintelligence as ecosystem of narrow services
AI Control
Section titled “AI Control”- Greenblatt, R., et al. (2024). AI Control: Improving Safety Despite Intentional Subversion. Redwood Research. arXiv
- Protocols for maintaining safety even if models are scheming
Scalable Oversight
Section titled “Scalable Oversight”- Bowman, S., et al. (2022). Measuring Progress on Scalable Oversight for Large Language Models. arXiv
- Methods for humans to effectively oversee increasingly capable AI
Constitutional AI
Section titled “Constitutional AI”- Bai, Y., et al. (2022). Constitutional AI: Harmlessness from AI Feedback. Anthropic. arXiv
- Training models to follow principles through self-critique
Iterated Distillation and Amplification (IDA)
Section titled “Iterated Distillation and Amplification (IDA)”- Christiano, P., et al. (2018). Supervising strong learners by amplifying weak experts. arXiv
- Recursive amplification of human judgment through AI assistance
Tool AI vs Agent AI
Section titled “Tool AI vs Agent AI”- Gwern (2016, updated 2023). Why Tool AIs Want to Be Agent AIs. gwern.net
- Economic and practical pressures toward agentic AI systems
Guaranteed Safe AI
Section titled “Guaranteed Safe AI”- Dalrymple, D., et al. (2024). Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems. arXiv
- Formal verification approaches to AI safety
Nuclear Safety & PRA
Section titled “Nuclear Safety & PRA”Probabilistic Risk Assessment
Section titled “Probabilistic Risk Assessment”- NUREG-1150 (1990). Severe Accident Risks: An Assessment for Five US Nuclear Power Plants. US Nuclear Regulatory Commission.
- Landmark study establishing PRA methodology
Fault Tree Analysis
Section titled “Fault Tree Analysis”- Vesely, W.E., et al. (1981). Fault Tree Handbook. NUREG-0492, US Nuclear Regulatory Commission. PDF
- Standard reference for fault tree construction and analysis
Defense in Depth
Section titled “Defense in Depth”- IAEA Safety Standards Series No. SF-1 (2006). Fundamental Safety Principles.
- International framework for nuclear safety philosophy
Event Tree Analysis
Section titled “Event Tree Analysis”- Rasmussen, N.C., et al. (1975). Reactor Safety Study: An Assessment of Accident Risks in US Commercial Nuclear Power Plants (WASH-1400). US Atomic Energy Commission.
- Original “Rasmussen Report” establishing quantitative risk assessment
Financial Risk Management
Section titled “Financial Risk Management”Risk Budgeting & Allocation
Section titled “Risk Budgeting & Allocation”- Litterman, R. (1996). Hot Spots and Hedges. Journal of Portfolio Management.
- Introduction to risk contribution analysis
Euler Allocation
Section titled “Euler Allocation”- Tasche, D. (2008). Capital Allocation to Business Units and Sub-Portfolios: the Euler Principle. arXiv
- Mathematical foundations of coherent risk allocation
Value at Risk
Section titled “Value at Risk”- Jorion, P. (2006). Value at Risk: The New Benchmark for Managing Financial Risk (3rd ed.). McGraw-Hill.
- Standard reference for VaR methodology
Expected Shortfall
Section titled “Expected Shortfall”- Acerbi, C., & Tasche, D. (2002). On the coherence of expected shortfall. Journal of Banking & Finance.
- Why ES/CVaR is preferred to VaR for tail risk
Coherent Risk Measures
Section titled “Coherent Risk Measures”- Artzner, P., et al. (1999). Coherent Measures of Risk. Mathematical Finance.
- Axiomatic foundations for risk measurement
Security Engineering
Section titled “Security Engineering”Least Privilege
Section titled “Least Privilege”- Saltzer, J.H., & Schroeder, M.D. (1975). The Protection of Information in Computer Systems. Proceedings of the IEEE.
- Classic paper establishing security design principles
Defense in Depth
Section titled “Defense in Depth”- Schneier, B. (2000). Secrets and Lies: Digital Security in a Networked World. Wiley.
- Accessible introduction to security engineering principles
Byzantine Fault Tolerance
Section titled “Byzantine Fault Tolerance”- Lamport, L., Shostak, R., & Pease, M. (1982). The Byzantine Generals Problem. ACM Transactions on Programming Languages and Systems.
- Foundational paper on consensus despite malicious actors
N-Version Programming
Section titled “N-Version Programming”- Avizienis, A. (1985). The N-Version Approach to Fault-Tolerant Software. IEEE Transactions on Software Engineering.
- Independent implementations for reliability
Mechanism Design
Section titled “Mechanism Design”Revelation Principle
Section titled “Revelation Principle”- Myerson, R.B. (1981). Optimal Auction Design. Mathematics of Operations Research.
- When truthful mechanisms are optimal
VCG Mechanism
Section titled “VCG Mechanism”- Vickrey, W. (1961). Counterspeculation, Auctions, and Competitive Sealed Tenders. Journal of Finance.
- Clarke, E.H. (1971). Multipart Pricing of Public Goods. Public Choice.
- Groves, T. (1973). Incentives in Teams. Econometrica.
- Foundations of incentive-compatible mechanism design
Contract Theory
Section titled “Contract Theory”- Bolton, P., & Dewatripont, M. (2005). Contract Theory. MIT Press.
- Comprehensive treatment of principal-agent problems
Shapley Values
Section titled “Shapley Values”- Shapley, L.S. (1953). A Value for n-Person Games. In Contributions to the Theory of Games II.
- Lundberg, S.M., & Lee, S.I. (2017). A Unified Approach to Interpreting Model Predictions (SHAP). arXiv
- Foundational and modern applications of Shapley values
Formal Methods & Type Systems
Section titled “Formal Methods & Type Systems”Linear Logic
Section titled “Linear Logic”- Girard, J.-Y. (1987). Linear Logic. Theoretical Computer Science, 50(1). DOI
- Resource-sensitive logic foundational to consumable delegation risk budgets
Markov Categories
Section titled “Markov Categories”- Fritz, T. (2020). A Synthetic Approach to Markov Kernels, Conditional Independence and Theorems on Sufficient Statistics. Advances in Mathematics, 370. arXiv
- Compositional probability theory for risk propagation
Capability-Based Security
Section titled “Capability-Based Security”- Dennis, J.B., & Van Horn, E.C. (1966). Programming Semantics for Multiprogrammed Computations. Communications of the ACM.
- Miller, M.S. (2006). Robust Composition: Towards a Unified Approach to Access Control and Concurrency Control. PhD thesis, Johns Hopkins University.
- Foundational work on capability systems
Software Engineering
Section titled “Software Engineering”Microservices Architecture
Section titled “Microservices Architecture”- Newman, S. (2015). Building Microservices. O’Reilly.
- Decomposition patterns for software systems
Site Reliability Engineering
Section titled “Site Reliability Engineering”- Beyer, B., et al. (2016). Site Reliability Engineering: How Google Runs Production Systems. O’Reilly. Free online
- Error budgets, monitoring, incident response
Circuit Breakers
Section titled “Circuit Breakers”- Nygard, M. (2018). Release It! Design and Deploy Production-Ready Software (2nd ed.). Pragmatic Bookshelf.
- Patterns for resilient systems
Automotive Safety
Section titled “Automotive Safety”ISO 26262
Section titled “ISO 26262”- ISO 26262:2018. Road vehicles — Functional safety.
- International standard for automotive safety integrity levels
ASIL Decomposition
Section titled “ASIL Decomposition”- Palin, R., & Goff, L. (2010). ASIL Decomposition: The Good, the Bad and the Ugly. SAE Technical Paper.
- Practical guidance on safety requirement allocation
Additional Resources
Section titled “Additional Resources”Online Courses
Section titled “Online Courses”- MIT OpenCourseWare: Nuclear Systems Design Project (22.033)
- Stanford: CS224N Natural Language Processing
- Coursera: Financial Engineering and Risk Management (Columbia)
Blogs & Forums
Section titled “Blogs & Forums”- AI Alignment Forum: alignmentforum.org
- LessWrong: lesswrong.com
- Anthropic Research: anthropic.com/research
Tools & Libraries
Section titled “Tools & Libraries”- LangChain: Agent orchestration framework
- Guardrails AI: LLM output validation
- OpenTelemetry: Observability framework
How to Cite This Framework
Section titled “How to Cite This Framework”If referencing this documentation:
AI Safety Framework: Structural Constraints for Safe AI Systems.https://[domain]/See Also
Section titled “See Also”- Related Approaches — How this framework compares to alternatives
- Background Research — Deep dives into specific domains
- Glossary — Terminology definitions