Skip to content

Bibliography & References

Key papers, books, and resources that inform this framework, organized by domain.

  • Drexler, K.E. (2019). Reframing Superintelligence: Comprehensive AI Services as General Intelligence. Technical Report, Future of Humanity Institute. Link
  • Foundational proposal for building superintelligence as ecosystem of narrow services
  • Greenblatt, R., et al. (2024). AI Control: Improving Safety Despite Intentional Subversion. Redwood Research. arXiv
  • Protocols for maintaining safety even if models are scheming
  • Bowman, S., et al. (2022). Measuring Progress on Scalable Oversight for Large Language Models. arXiv
  • Methods for humans to effectively oversee increasingly capable AI
  • Bai, Y., et al. (2022). Constitutional AI: Harmlessness from AI Feedback. Anthropic. arXiv
  • Training models to follow principles through self-critique

Iterated Distillation and Amplification (IDA)

Section titled “Iterated Distillation and Amplification (IDA)”
  • Christiano, P., et al. (2018). Supervising strong learners by amplifying weak experts. arXiv
  • Recursive amplification of human judgment through AI assistance
  • Gwern (2016, updated 2023). Why Tool AIs Want to Be Agent AIs. gwern.net
  • Economic and practical pressures toward agentic AI systems
  • Dalrymple, D., et al. (2024). Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems. arXiv
  • Formal verification approaches to AI safety

  • NUREG-1150 (1990). Severe Accident Risks: An Assessment for Five US Nuclear Power Plants. US Nuclear Regulatory Commission.
  • Landmark study establishing PRA methodology
  • Vesely, W.E., et al. (1981). Fault Tree Handbook. NUREG-0492, US Nuclear Regulatory Commission. PDF
  • Standard reference for fault tree construction and analysis
  • IAEA Safety Standards Series No. SF-1 (2006). Fundamental Safety Principles.
  • International framework for nuclear safety philosophy
  • Rasmussen, N.C., et al. (1975). Reactor Safety Study: An Assessment of Accident Risks in US Commercial Nuclear Power Plants (WASH-1400). US Atomic Energy Commission.
  • Original “Rasmussen Report” establishing quantitative risk assessment

  • Litterman, R. (1996). Hot Spots and Hedges. Journal of Portfolio Management.
  • Introduction to risk contribution analysis
  • Tasche, D. (2008). Capital Allocation to Business Units and Sub-Portfolios: the Euler Principle. arXiv
  • Mathematical foundations of coherent risk allocation
  • Jorion, P. (2006). Value at Risk: The New Benchmark for Managing Financial Risk (3rd ed.). McGraw-Hill.
  • Standard reference for VaR methodology
  • Acerbi, C., & Tasche, D. (2002). On the coherence of expected shortfall. Journal of Banking & Finance.
  • Why ES/CVaR is preferred to VaR for tail risk
  • Artzner, P., et al. (1999). Coherent Measures of Risk. Mathematical Finance.
  • Axiomatic foundations for risk measurement

  • Saltzer, J.H., & Schroeder, M.D. (1975). The Protection of Information in Computer Systems. Proceedings of the IEEE.
  • Classic paper establishing security design principles
  • Schneier, B. (2000). Secrets and Lies: Digital Security in a Networked World. Wiley.
  • Accessible introduction to security engineering principles
  • Lamport, L., Shostak, R., & Pease, M. (1982). The Byzantine Generals Problem. ACM Transactions on Programming Languages and Systems.
  • Foundational paper on consensus despite malicious actors
  • Avizienis, A. (1985). The N-Version Approach to Fault-Tolerant Software. IEEE Transactions on Software Engineering.
  • Independent implementations for reliability

  • Myerson, R.B. (1981). Optimal Auction Design. Mathematics of Operations Research.
  • When truthful mechanisms are optimal
  • Vickrey, W. (1961). Counterspeculation, Auctions, and Competitive Sealed Tenders. Journal of Finance.
  • Clarke, E.H. (1971). Multipart Pricing of Public Goods. Public Choice.
  • Groves, T. (1973). Incentives in Teams. Econometrica.
  • Foundations of incentive-compatible mechanism design
  • Bolton, P., & Dewatripont, M. (2005). Contract Theory. MIT Press.
  • Comprehensive treatment of principal-agent problems
  • Shapley, L.S. (1953). A Value for n-Person Games. In Contributions to the Theory of Games II.
  • Lundberg, S.M., & Lee, S.I. (2017). A Unified Approach to Interpreting Model Predictions (SHAP). arXiv
  • Foundational and modern applications of Shapley values

  • Girard, J.-Y. (1987). Linear Logic. Theoretical Computer Science, 50(1). DOI
  • Resource-sensitive logic foundational to consumable delegation risk budgets
  • Fritz, T. (2020). A Synthetic Approach to Markov Kernels, Conditional Independence and Theorems on Sufficient Statistics. Advances in Mathematics, 370. arXiv
  • Compositional probability theory for risk propagation
  • Dennis, J.B., & Van Horn, E.C. (1966). Programming Semantics for Multiprogrammed Computations. Communications of the ACM.
  • Miller, M.S. (2006). Robust Composition: Towards a Unified Approach to Access Control and Concurrency Control. PhD thesis, Johns Hopkins University.
  • Foundational work on capability systems

  • Newman, S. (2015). Building Microservices. O’Reilly.
  • Decomposition patterns for software systems
  • Beyer, B., et al. (2016). Site Reliability Engineering: How Google Runs Production Systems. O’Reilly. Free online
  • Error budgets, monitoring, incident response
  • Nygard, M. (2018). Release It! Design and Deploy Production-Ready Software (2nd ed.). Pragmatic Bookshelf.
  • Patterns for resilient systems

  • ISO 26262:2018. Road vehicles — Functional safety.
  • International standard for automotive safety integrity levels
  • Palin, R., & Goff, L. (2010). ASIL Decomposition: The Good, the Bad and the Ugly. SAE Technical Paper.
  • Practical guidance on safety requirement allocation

  • MIT OpenCourseWare: Nuclear Systems Design Project (22.033)
  • Stanford: CS224N Natural Language Processing
  • Coursera: Financial Engineering and Risk Management (Columbia)
  • LangChain: Agent orchestration framework
  • Guardrails AI: LLM output validation
  • OpenTelemetry: Observability framework

If referencing this documentation:

AI Safety Framework: Structural Constraints for Safe AI Systems.
https://[domain]/