Paul Graham reflects on what he's learned from advising thousands of YC startups. Core claim: most startups face the same recurring problems, but founders are often terrible at diagnosing which ones actually matter. The primary value of YC is forcing radical focus on the right problems at high frequency, retraining founders away from 'hacking the test' mentality, and creating a dense peer network that compounds both advice and energy.
Key insights
- ·Most startups have the same problems regardless of what they build. After advising ~100 startups, you rarely see new failure modes. This pattern-matching is why YC works but takes volume to learn—later-stage investors never get this dataset.
- ·Founders routinely misdiagnose their problems. They'll obsess over fundraising when the real issue is the product sucks, or worry about user acquisition when they wouldn't use their own product. They confuse symptom for cause and can't rank urgency (boyfriend vs. murderer analogy).
- ·Founders don't listen because startup advice is counterintuitive, so correct guidance literally sounds wrong to them. They only believe it after painful experience. This isn't stubbornness—it's that startups operate on rules opposite to school/jobs.
- ·The essence of YC is forcing founders to identify the 1-2 problems that will kill the company this week, propose testable solutions, execute, and measure results within 7 days. High-frequency course correction lets you be decisive micro-scale while tentative macro-scale—like a running back.
- ·Educational systems train you to 'hack the test' rather than learn the material. This works until startups, where there is no test to game. YC spends a year deprogramming this reflex, and founders still revert to old habits.
- ·YC is specialization, not apprenticeship. Partners have encyclopedic breadth across startup problems; founders have deep domain knowledge. Neither should acquire the other's shape of knowledge—hence why even experienced founders benefit from coaching.
- ·The peer network may matter more than partner advice. Great work clusters (Florence 1490s, Bell Labs, PARC). YC deliberately designed itself as a cluster. The energy at YC dinners is unique, and founders are shockingly generous helping each other—magnified by intentional design.
Action items
- →Adopt weekly review cadence for Bureau/Fuse: What are the 1-2 problems that will kill growth this week? Propose testable solution, execute, measure by next week. Force ranking of urgency, not just problem lists.
- →Audit whether you're 'hacking the test' anywhere (fundraising theater, vanity metrics, playing investor expectations vs. building real value). PG's framing: are you optimizing for the measurement or the thing being measured?
- →Consider creating a lightweight 'founder cluster' for your portfolio/network—monthly dinners, async Slack for real-time problem-solving. The ROI on peer energy + pattern-matching may exceed 1-on-1 advising.
Paul Graham's comprehensive essay on doing great work distills patterns across disciplines into actionable principles. Core thesis: great work emerges from the intersection of natural aptitude, deep interest, and ambitious scope—executed through cycles of curiosity-driven exploration, noticing gaps at knowledge frontiers, and iterative building. The essay emphasizes that choosing what to work on is often more important than execution skill, and that consistency + compounding beats sporadic bursts.
Key insights
- ·Work selection hierarchy inverted: Graham argues choosing WHAT to work on (the question) often matters more than HOW you solve it. Most underrate problem selection vs. solution execution. This directly applies to Marco deciding between AI infrastructure plays vs. flex-living operational innovation.
- ·Staying upwind > planning: Don't plot 5-year roadmaps. Instead maintain 'invariants' (work on exciting + ambitious + gives good optionality) and let the path emerge. Each stage, do what's most interesting with best future options. Mirrors how Fuse/Bureau likely evolved vs. being master-planned.
- ·Per-project procrastination is the killer: Delaying the ambitious project year after year (camouflaged as 'productive work on safer bets') does more damage than daily procrastination. Question: 'Am I working on what I MOST want to work on?' becomes critical as you age.
- ·Compounding + consistency beats intensity: Writing one page/day = book/year. Building one small AI tool/week >> one massive project/quarter that never ships. Exponential growth feels flat early (people quit), but conscious investment in compounding domains (learning, audience, code infrastructure) separates great from good.
- ·Make successive versions, start laughably small: Great work is almost never planned in final form—it's evolved. Ship v1 fast (even if dismissed as 'toy'), iterate on user response. Directly applicable to Bureau's agent products and Marco's real estate portfolio expansion strategy.
- ·Curious to a degree that bores others = your edge: If you're obsessively interested in something most people find tedious (AI agent orchestration, peptide protocols, flex-living unit economics), that asymmetry IS the signal. Don't flatten your 'weird' interests to seem normal.
- ·Earn your right to break rules via strictness: The best work comes from people who are BOTH strict (notice where reality conflicts with models) AND willing to break implicit rules. Not rebellious for rebellion's sake, but because they see something others don't. Example: Einstein with Maxwell's equations.
Action items
- →Weekly audit: 'Am I working on what I MOST want to work on?' (not just productive work). If answer is no repeatedly, per-project procrastination is happening.
- →Catalog 3-5 questions from youth/early career that still nag at you—probably fertile ground for differentiated bets now that you have resources/expertise.
- →Identify 1-2 compounding domains to deliberately invest in (e.g., agent infrastructure knowledge, longevity biomarker tracking, Dubai RE network effects) where daily small inputs create exponential curves.
- →List what you're 'excessively curious about to a degree that bores most people'—treat that asymmetry as strategic signal, not a quirk to downplay.
Paul Graham argues that performance returns in most meaningful domains are superlinear (not linear as teachers claim). Small performance gaps yield massive outcome differences. This stems from two causes: exponential growth (learning, network effects, compounding) and thresholds (winner-take-all dynamics). The decline of institutional gatekeeping is expanding who can access these returns, favoring independent-minded risk-takers willing to work on novel problems.
Key insights
- ·Half as good = zero customers, not half. In business, fame, knowledge, and power, marginal performance differences create exponential outcome gaps. Teachers lied: you don't 'get out what you put in' linearly.
- ·Two root causes of superlinear returns: (1) exponential growth (learning compounds, networks scale, growth begets growth) and (2) thresholds (crossing a performance bar unlocks disproportionate rewards). These often reinforce each other.
- ·Work that compounds is the selection heuristic. Either direct compounding (infrastructure, audience, brand) or learning compounding. Even if you fail at the immediate goal, if you're learning fast you're on an exponential path.
- ·Institutions damped variance historically—prestige = org prestige. Now technology + decentralization means individuals can capture returns artists/writers once had. This creates MORE inequality but also democratizes access to trying. Not everyone should opt in; only those who can afford the downside variance.
- ·Fields with superlinear returns share a trait: independent-mindedness required. Where a few big winners dominate (science, investing, startups, art) you must have novel + correct ideas, not just correct. Consensus beliefs = no alpha.
- ·Do things that don't scale to get the initial toehold. Superlinear curves look discouraging early (flat start) but reward extraordinary early effort because the steep end justifies it. Most people quit before the curve bends.
- ·Follow curiosity over careerism for intellectual breakthroughs. Problems that seem 'mystifying but unimportant' (not boring, not obviously prestigious) hide new fields. Ambition climbs existing peaks; deep curiosity can grow a new mountain beneath you.
Action items
- →Audit current work through 'does this compound?' lens—Bureau AI agents likely yes (learning + infrastructure), flex-living operations maybe (brand/systems), RE portfolio if actively leveraging into new plays.
- →Pressure-test Bureau strategy: are you doing non-scalable things for early customers to kickstart word-of-mouth exponential growth, or chasing vanity metrics?
- →Review learning systems: are you structuring compounding knowledge capture (Zettelkasten, evergreen notes) or just consuming content? PG's heuristic = 'always be learning' but make it sticky.
- →Identify one 'mystifying but unimportant' question in your stack (agents, RE operating models, longevity protocols) and allocate 10% time. High EV if it opens a new field for you.
Paul Graham argues the best way to generate new ideas is to notice anomalies—things that seem strange, missing, or broken. While anomalies exist in everyday life, the highest-value ones live at the frontiers of knowledge. Knowledge grows fractally: from afar it looks smooth, but up close the edges are full of obvious gaps no one has explored. Exploring these gaps can yield entirely new fields.
Key insights
- ·New ideas come from noticing what's broken, missing, or strange—not from brainstorming in a vacuum. Anomaly-detection is the core skill.
- ·The best anomalies are at the knowledge frontier, not in everyday observations. You have to get close enough to a field to see the fractal gaps that aren't visible from a distance.
- ·Knowledge grows fractally: edges that look smooth from outside reveal obvious, unexplored gaps when you're inside. These gaps feel inexplicable once you see them—'why hasn't anyone tried X?'
- ·Exploring frontier gaps can create entirely new fractal buds (new sub-fields or categories), not just incremental improvements.
Action items
- →Deliberately position at the edge of AI agents / flex-living / RE—read cutting-edge papers, follow builders live-tweeting, attend niche communities. Surface gaps that insiders see but outsiders miss.
- →When building with Claude Code / agent infra, keep an 'anomaly log'—specific things that feel broken, missing primitives, weird UX gaps. Share publicly; this surfaces founder-operator credibility.
- →Ask: what's the fractal bud in flex-living or agent tooling that no one's named yet? What category doesn't exist but should?
YC's Francois Chodard explains how recursion at inference time (not just scaling parameters) can dramatically improve AI reasoning. He contrasts two 2025 papers—HRM (27M params) and TRM (7M params)—both achieving ~70-87% on ARC Prize, outperforming models 10,000x larger. The core innovation: outer refinement loops + truncated backprop through time (t=1) + training across latent 'memory states' instead of discrete token chains of thought. These tiny recursive models solve incompressible problems (Sudoku, mazes) that transformers provably can't in one shot. The future: fusing LLM's rich embeddings with tiny recursive reasoning layers.
Key insights
- ·Transformers are fundamentally limited on incompressible tasks (sorting, Sudoku, mazes) because they can't do O(n log n) comparisons in fixed layers. Chain-of-thought & tool-use are hacks bounded by training data, not discovery.
- ·HRM/TRM use *outer refinement loops* (running the same weights 16 times on evolving hidden states) as a pseudo-minibatch across memory space, not input space. This sidesteps vanishing gradients from classic RNN backprop-through-time.
- ·TRM's key simplification: weight-share between high/low-level nets, use 1 transformer layer (vs 4), backprop through only 1 full recursion step (t=1). Result: 7M params beat 100B+ models on ARC Prize.
- ·The training feels like EM: update local working memory (Z_L) conditioned on candidate answer (Z_H), then update candidate conditioned on working memory. Model learns the *algorithm* to solve the problem, not memorize examples.
- ·Recursion depth at test time matters less than train-time recursion—models trained on 16 steps perform nearly as well tested on 1 step. This is counterintuitive but confirmed by ablations.
- ·The frontier opportunity: LLMs build amazing embedding spaces, but reason via token-space hacks. Fusing LLM embeddings with tiny recursive reasoning modules (like TRM) could unlock step-change in efficiency and capability on hard problems.
Action items
- →Experiment with TRM-style outer refinement loops in agent workflows—run Claude/GPT with persistent memory state across calls, not just token context.
- →Explore recursive prompting architectures: Can you simulate 'latent recursion' by having an LLM update a structured scratchpad (JSON state) over multiple inference passes before final output?
- →Track research from Alexia (TRM author) and Sapient team—this is bleeding-edge agent infra that will likely inform next-gen reasoning models like O3/Claude Opus successors.
Max Schoening (Head of Product at Notion) argues that in the AI era, agency—not skills—is the differentiator. He describes how Notion designers/PMs now prototype in code (not Figma), how the first 10% of any project is now 'free,' and why great products succeed on one tiny exceptional core, not feature accumulation. He's skeptical of the 'SaaS apocalypse' narrative and believes we already have UBI—it's called knowledge work.
Key insights
- ·Agency > skills: AI tools give everyone coding skills at their fingertips, but the differentiator is now *agency*—the belief that you can just change things. Cultivate agency by making/tinkering repeatedly, not by reading about roles.
- ·First 10% of every project is now free: Building the janky first version (demo, not memo) takes almost no effort. The last 10% is still 90% of the work. This shifts product process from waterfall PRDs to rapid prototyping in code.
- ·Designers/PMs at Notion now ship code: They moved prototyping from Figma to a small LLM-friendly playground, then gradually to production. The goal isn't shipping code per se—it's *thinking in the material* (understanding agent loops, not just styling).
- ·Great products have one tiny superpower core: Multi-touch on iPhone, pull requests on GitHub, blocks/slash commands in Notion, 'git push heroku master.' Avoid the trap of 'if I add one more feature it'll finally be great'—that never works.
- ·Taste = running a virtual machine in your head: Predicting how a specific in-group will react to an idea. Built via reps + feedback (like training a model). Designers with best taste constantly tinker with new apps + build full-stack side projects.
- ·Malleable software matters more now: Software should work for the user's interests, not the corporation's. AI makes building custom tools trivial—but you need a platform (OS) that encourages malleability without losing collaboration/security. Notion is positioning as that OS.
- ·Token spend isn't the metric: Marco-style, Max doesn't care about token budgets or leaderboards (though Notion's top PM spends 'thousands, maybe tens of thousands'). What matters is enlisting agents into the outer loop of your work, not bragging about LOC-equivalent metrics.
Action items
- →If you haven't already, set up a small LLM-friendly playground (separate from your main codebase) so non-engineers can prototype with AI tools without fear. This is how Notion onboarded designers/PMs to coding.
- →Audit your real estate/flex-living ops: Where is the 'first 10% now free'? Could you use Cursor/Replit to build custom tenant intake flows, pricing calculators, or ops dashboards in a weekend instead of hiring devs?
- →Apply 'obviously good' + 'tiny core' test to Bureau/Fuse products: What's the one thing that's so exceptionally good it's intoxicating? If you're adding features hoping it'll 'finally be great,' stop—consolidate to the naked robotic core.
- →Consider building a malleable 'OS' layer for flex-living operations—like Notion for real estate—where property managers/founders can customize workflows without vendor lock-in. This is where AI + RE leverage converge.
YC's take: the next trillion 'users' are AI agents, not humans. Agents currently operate via human-designed UIs (slow, brittle). The real opportunity is building agent-first software—machine-readable interfaces (APIs, MCPs, CLIs), not visual dashboards. Every major SaaS category will be rebuilt for agents as first-class users, and incumbents won't do it. Startups that build the picks-and-shovels infrastructure for agents will win.
Key insights
- ·Agents browsing the web / buying / managing CRMs via legacy human UIs is fundamentally inefficient; they need APIs, MCPs (Model Context Protocols), and CLIs as native interfaces.
- ·Agent-first software requires machine-readable documentation and zero-human-in-the-loop onboarding—agents must discover, sign up, and integrate tools programmatically.
- ·Every major SaaS category (CRM, analytics, scheduling, etc.) will be rebuilt for agents; incumbents adding 'agent support' won't capture the opportunity—new startups building agent-native will.
- ·YC is explicitly funding founders building infrastructure for agents, not just building agents themselves—classic picks-and-shovels play in the AI gold rush.
Action items
- →Audit Bureau AI's product stack: are you building agent-native interfaces (MCP support, programmatic docs, API-first) or retrofitting human UIs? If the latter, roadmap the former.
- →Explore building an MCP connector or agent-optimized API layer for Bureau AI's workflow automation—position as the agent-first alternative to legacy task management SaaS.
- →Watch for agent-first CRM, scheduling, or property management tools emerging in flex-living/RE—evaluate for Fuse or personal portfolio operational leverage.
Paul Graham argues reading is irreplaceable not because it's the best way to acquire information, but because writing is essential to thinking—and you can't write well without reading well. The act of writing generates new ideas that don't emerge from just talking or thinking. People who want to have ideas (not just consume information) must remain good readers and writers.
Key insights
- ·Writing is a thinking tool, not just a communication tool. Good writers discover new ideas in the process of writing itself—there's no substitute for this type of discovery.
- ·Complex, ill-defined problems (the kind founders face constantly) almost always require writing to solve properly. Verbal discussion helps but leaves discoveries on the table.
- ·You can't write well without reading well, which means people who want to generate original ideas can't afford to abandon deep reading, even if information gets faster to consume via other means.
- ·Reading teaches you how to write in a way audiobooks don't. The act of extracting meaning from text on a page builds the muscle you use when constructing arguments on the page.
Action items
- →Audit how much actual writing Marco does when solving strategic problems (fundraising strategy, operating model decisions, positioning)—are there places he's skipping the writing step and leaving insights undiscovered?
- →Consider a weekly writing practice for clarifying fuzzy strategy questions (doesn't need to be public, just structured thinking on paper).
Paul Graham proposes 'alien truth' as a framework for finding universal principles beyond math/physics—truths any intelligent being would share (controlled experiments, practice improving skill, Occam's razor, possibly justice). He argues this is what philosophy should be: discovering principles robust enough to transcend human-specific cognition. He notes AI may let us test this empirically, but the exercise of targeting alien truth is valuable now regardless.
Key insights
- ·Alien truth = heuristic for finding maximally general principles. If a truth would plausibly hold for any intelligent life (not just humans), it's more fundamental than culture-specific ideas. This is a filter for first-principles thinking.
- ·Philosophy redefined: not academic jargon, but the search for truths robust across all forms of intelligence. Shifts philosophy from descriptive (what do humans think?) to normative (what must any rational agent believe?).
- ·AI as empirical test: We may soon create non-human intelligence (AGI) and observe whether it converges on Occam's razor, controlled experiments, etc. This could make philosophy testable, not just argumentative.
- ·Erdos's God's book extends beyond math: Really good proofs feel discovered, not invented, because their elegance is universal. Graham argues the same applies to core operating principles—some ideas are 'in the book' for any intelligence.
- ·Err on the side of generosity: Don't wait for certainty about what aliens would know. Use 'might plausibly be alien truth' as the bar. Paralysis from precision is the enemy; best guesses are often surprisingly close to optimal.
Action items
- →When designing agent architectures or prompting strategies, ask: 'Would any intelligent system converge on this principle, or is this a human workaround?' Optimize for alien-truth-grade reasoning patterns.
- →Filter business/operating principles through this lens: does this heuristic (e.g., 'move fast and break things,' 'margin of safety') rest on universal logic or cultural fashion? Prioritize the former for long-term leverage.
- →In team communication or investor memos, stress first-principles arguments over social proof. If you can frame a decision as alien-truth-adjacent (e.g., 'Occam's razor says simplest solution'), it's harder to argue against.
All-In Podcast hosts discuss OpenAI missing 2025 user/revenue targets (900M WAU vs 1B goal), contrasted with strong GPT-5.5/Codex reception and Anthropic's Opus 4.7 compute rationing. Debate centers on whether OpenAI's $600B compute commitments are visionary or reckless, with Chamath arguing power—not demand—is the constraint. Big Tech earnings show $725B capex guidance (Google/Microsoft/Amazon/Meta) with 60%+ cloud growth but collapsing free cash flow. Elon v. Altman trial surfaces Greg Brockman's incriminating diary. Retatride (Lilly's triple agonist) hype accelerates. Court visit for Monsanto/EPA case highlights federal preemption debate.
Key insights
- ·OpenAI's consumer miss is offset by enterprise momentum: GPT-5.5 beats Opus 4.7 in coding due to OpenAI's superior compute capacity; Anthropic is token-constrained and rationing, opening a window for OpenAI/Grok to capture developer share.
- ·Power is the real AI bottleneck: <50% of announced gigawatt projects are actually under construction due to transformer/grid supply chain delays; this favors hyperscalers (Oracle, AWS, Azure, GCP) who can negotiate equity stakes with model labs starved for compute.
- ·Hyperscalers are sacrificing cash flow for infrastructure dominance: Google Cloud +63% YoY, Azure +30%, AWS +28%; free cash flow collapsed 97% (Amazon), 12% (Google/Microsoft) as they commit $725B capex in 2026—a structural shift from asset-light to asset-heavy models.
- ·Grok/SpaceX have a massive opening: Elon's excess capacity + Anthropic/OpenAI's power constraints = lucrative enterprise deals (cursor was 'appetizer'); Chamath suggests Elon should cut a deal with Dario immediately.
- ·Vibe coding hits trough of disillusionment: Claude agent deleted production database + backups in 9 seconds; AI still doesn't 'know what it doesn't know,' requires human supervision—Aaron Levie nails it: 'fantastic for devs, terrible for casuals maintaining complex systems.'
- ·Retatride = metabolic game-changer: Tri-agonist (GLP-1/GIP/glucagon) drives fat loss over muscle (-80% liver fat, -27% cholesterol, -37 lbs in 40 weeks), potential anti-aging via reduced inflammation; fitness community will adopt for cutting cycles pre-FDA approval (mid-2027).
- ·Federal preemption vs. state sovereignty is the sleeper policy fight: Supreme Court Monsanto case (EPA label authority vs. CA failure-to-warn) could blow up if states gain right to ignore federal regulatory bodies post-Chevron—implications for FDA, USDA, every agency.
Action items
- →Monitor Grok enterprise partnerships—if Elon signs deals with frontier labs or Fortune 500s, that's a leading indicator SpaceX AGI infrastructure is real revenue, not just capacity flex.
- →Assess power/grid infrastructure plays: if <50% of announced gigawatts will be built, companies solving transformer supply chain or modular nuclear (like the 3 Mile Island deal) are the pickaxes in this gold rush.
- →Track Retatride off-label adoption in longevity/fitness circles—early adopter demand signals how fast metabolic peptides move from clinical obesity to mainstream optimization (compare to GLP-1 trajectory).
- →Consider exposure to hyperscaler debt instruments: as they lever up balance sheets for capex (Chamath predicts 'financial engineering'), investment-grade debt from MSFT/GOOGL/AMZN/META becomes a different risk/return profile than equity.
YC partner argues that hardware iteration speed is the critical bottleneck for US hardware startups vs China. In Shenzhen, design-to-part takes 1 day; in the US it takes weeks. YC is funding companies that compress this loop—rapid prototyping services, actuator manufacturers, and infrastructure that tightly integrates design, manufacturing, and logistics to enable order-of-magnitude faster hardware iteration.
Key insights
- ·The US-China hardware gap is fundamentally about iteration speed, not just supply chain access. Shenzhen's 1-day design-to-part loop vs US weeks creates compounding advantages over product development cycles.
- ·YC is shifting capital allocation toward picks-and-shovels infrastructure plays (HLABS for actuators, Prototyping IO for rapid mechanical parts) rather than just funding end hardware products.
- ·The winning pattern for hardware founders is tight vertical integration of design/manufacturing/logistics, mirroring how software startups own their deployment pipelines.
Action items
- →If Bureau AI builds any physical robotics/hardware prototypes, investigate Prototyping IO or equivalent EU rapid-turnaround shops to avoid multi-week lead times.
- →Framework: apply 'iteration speed as moat' lens to Fuse operations—where are our own feedback loops (guest experience → changes, unit economics → model tweaks) slower than they could be?