<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Flat Circle]]></title><description><![CDATA[Creative ways investors use LLMs in their research process]]></description><link>https://blog.flatcircle.ai</link><image><url>https://substackcdn.com/image/fetch/$s_!ulyQ!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22afde1e-8b68-470c-bb7c-cda75746a522_512x512.png</url><title>Flat Circle</title><link>https://blog.flatcircle.ai</link></image><generator>Substack</generator><lastBuildDate>Mon, 04 May 2026 06:47:51 GMT</lastBuildDate><atom:link href="https://blog.flatcircle.ai/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Flat Circle]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[flatcircleai@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[flatcircleai@substack.com]]></itunes:email><itunes:name><![CDATA[Jim Moran]]></itunes:name></itunes:owner><itunes:author><![CDATA[Jim Moran]]></itunes:author><googleplay:owner><![CDATA[flatcircleai@substack.com]]></googleplay:owner><googleplay:email><![CDATA[flatcircleai@substack.com]]></googleplay:email><googleplay:author><![CDATA[Jim Moran]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Creative AI Use Cases - Apr 16, 2026]]></title><description><![CDATA[How to organize your second brain, custom datasets, clinical trial predictions and two new benchmarks]]></description><link>https://blog.flatcircle.ai/p/creative-ai-use-cases-apr-16-2026</link><guid isPermaLink="false">https://blog.flatcircle.ai/p/creative-ai-use-cases-apr-16-2026</guid><dc:creator><![CDATA[Jim Moran]]></dc:creator><pubDate>Thu, 16 Apr 2026 13:36:38 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Gk3v!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a8c215f-dac3-4e25-aeae-d088a08d0eb5_1672x562.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Flat Circle tracks creative uses of LLMs in hedge funds. Join hundreds of PMs,  analysts and engineers reading each week:</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.flatcircle.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.flatcircle.ai/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h3>Custom datasets you can build with Claude Cowork</h3><p><strong>Flag discrepancies between earnings press releases and transcripts</strong> (<a href="https://deltasheets.substack.com/">InfoArb</a>)</p><p><strong>Yale finance professor creates a custom dataset of tariff exposure</strong> (<a href="https://paulgp.substack.com/p/from-edgar-filings-to-a-structured">Paul Goldsmith-Pinkham</a>)</p><p><strong>Extract supplier geographies from a 10K filing, determine transit corridors and query AIS vessel-tracking databases and ICEGATE customs registries to audit and monitor supply chains</strong> (<a href="https://medium.com/write-a-catalyst/the-balance-sheet-is-already-a-lie-how-i-used-agentic-ai-to-audit-companies-in-real-time-53d01ea71a44">Alan Shore</a>)</p><p><strong>Analyze webcam footage of a fab construction site, counting building levels based on crane heights, to test consensus for WFE spending</strong> (<a href="https://x.com/lfg_cap/status/2041567709926711606">@lfg_cap</a>)</p><p><strong>Official Claude for Financial Services plugin, including skills like catalyst-calendar, model-update, morning-note and idea-generation (</strong><a href="https://github.com/anthropics/financial-services-plugins/tree/main/equity-research/skills">GitHub</a><strong>). </strong>Related, a trader published skills around market analysis, technical charting, economic calendars and screeners (<a href="https://github.com/tradermonty/claude-trading-skills">GitHub</a>)</p><h3>How to organize your second brain</h3><p>Lot of discussion this week re: building your own personal knowledge base - a system of markdown files containing your investment philosophy, research, patterns, skills, notes, etc - enabling your Claude Cowork or OpenClaw to deliver exactly what you need.</p><p>It&#8217;s all about how you organize it. Models perform worse if you feed it too much information, so you want to arrange your knowledge base such that your system pulls in exactly the context it needs to make a given decision, while not missing anything important nor diluting itself with unnecessary info.</p><p>A few approaches:</p><ul><li><p><strong>OpenAI cofounder Andrej Karpathy published LLM Wiki, which organizes your knowledge into a personal wikipedia, with agents that maintain and crosslink across concept pages. Focus is getting your system to reflect your views on a given company, sector or theme</strong> (<a href="https://x.com/karpathy/status/2039805659525644595">@karpathy</a>, <a href="https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f">GitHub</a>). </p></li><li><p><strong>CEO of YCombinator published GBrain, a framework optimized for systems like OpenClaw where the focus is performing as many different actions as possible</strong> (<a href="https://github.com/garrytan/gbrain">GitHub</a>)</p></li><li><p><em><strong>The 5th Element</strong></em><strong> star Milla Jovovich published MemPalace, a framework optimized for retrieval of verbatim documents</strong> (<a href="https://github.com/MemPalace/mempalace">GitHub</a>)</p></li><li><p><strong>A former sellsider recommends pulling in four main concepts for each decision: (i) the most important events, (ii) the most recent events, (iii) retrieved related events, (iv) curated long term memories. This attempts to model how the human brain compiles knowledge to make a decision</strong> (<a href="https://x.com/HenryChien4/status/2042609927110217970">@HenryChien4</a>)</p></li></ul><p><strong>My take:</strong> the right organizational scheme depends on what you&#8217;re optimizing for and your ability to catch mistakes. GBrain is great for personal AI assistants. LLM Wiki best reflects your views and experience. MemPalace optimizes for accurately returning the right source documents. The more gold standard examples you can test against, the easier it is to experiment with organizational schemes.</p><h3>New papers and benchmarks</h3><p><strong>New paper on &#8220;LLM herding&#8221; shows including buy/sell ratings significantly influences model responses, even if subsequent analysis contradicts the rating.</strong> </p><p><strong>My take:</strong> For investor agents, sellside research can be a prompt injection: managing what insights are fed to the decision models is critical. This can be a problem when you enable a model with open web search, which can pick up sellside research headlines (<a href="https://openreview.net/pdf?id=AB0GxLOAn9">OpenReview</a>)</p><p><strong>New agent benchmark tests ability to build LBO, lender and DCF models: GPT 5.4 outperforms Claude and Gemini models but still lags human experts</strong> (<a href="https://arxiv.org/pdf/2604.05912">arXiv.org</a>). Related, an <a href="https://www.100baggerhunting.com/p/100-software-companies-tested-for">investor substack</a> tests the leading Claude models on assessing business quality and recommends Sonnet 4.6 w/ thinking mode enabled. </p><h3>Updates from AI trading arenas</h3><p><em>These are public experiments where agents make trading decisions. We track every arena <a href="https://blog.flatcircle.ai/p/ai-trading-arenas">here</a>:</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://blog.flatcircle.ai/p/ai-trading-arenas" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Gk3v!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a8c215f-dac3-4e25-aeae-d088a08d0eb5_1672x562.png 424w, https://substackcdn.com/image/fetch/$s_!Gk3v!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a8c215f-dac3-4e25-aeae-d088a08d0eb5_1672x562.png 848w, https://substackcdn.com/image/fetch/$s_!Gk3v!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a8c215f-dac3-4e25-aeae-d088a08d0eb5_1672x562.png 1272w, https://substackcdn.com/image/fetch/$s_!Gk3v!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a8c215f-dac3-4e25-aeae-d088a08d0eb5_1672x562.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Gk3v!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a8c215f-dac3-4e25-aeae-d088a08d0eb5_1672x562.png" width="1456" height="489" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0a8c215f-dac3-4e25-aeae-d088a08d0eb5_1672x562.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:489,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://blog.flatcircle.ai/p/ai-trading-arenas&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Gk3v!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a8c215f-dac3-4e25-aeae-d088a08d0eb5_1672x562.png 424w, https://substackcdn.com/image/fetch/$s_!Gk3v!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a8c215f-dac3-4e25-aeae-d088a08d0eb5_1672x562.png 848w, https://substackcdn.com/image/fetch/$s_!Gk3v!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a8c215f-dac3-4e25-aeae-d088a08d0eb5_1672x562.png 1272w, https://substackcdn.com/image/fetch/$s_!Gk3v!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a8c215f-dac3-4e25-aeae-d088a08d0eb5_1672x562.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>New arena focused on clinical trial predictions</strong> (<a href="https://x.com/endpointarena/status/2043346690963374300">launch post</a>, <a href="https://endpointarena.com/">website</a>). Continues a trend of vertical trading arenas. </p><p><strong>GLM 5, a model by publicly traded Chinese lab Z.ai, has been making money on Prediction Arena </strong>(<a href="https://www.predictionarena.ai/">PredictionArena.ai</a>). GLM models score well on long-horizon software engineering tasks. The agent powered by GLM 5 appears to prefer betting on Kalshi markets with asymmetric payoffs (ie that trade around 1 cent for a 1 dollar payoff at time of entry. </p><p><strong>Prediction Arena published a paper</strong> (<a href="https://arxiv.org/abs/2604.07355">arXiv.org</a>). Takeaways are that models perform better when they are able to select from a larger universe of markets and that there&#8217;s diminishing returns to incremental research.</p><h3>Follow for more creative AI use cases</h3><p><em>If you would like to discuss incorporating agents into your research process, reply to this email or reach out via <a href="https://x.com/jdmoran">X</a> or <a href="https://www.linkedin.com/in/jamesdmoran/">LinkedIn</a>.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.flatcircle.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.flatcircle.ai/subscribe?"><span>Subscribe now</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[Creative LLM use cases - Needles in a haystack, interviewing yourself, email for agents and OpenClaw]]></title><description><![CDATA[Plus: Which model works best for which workflow?]]></description><link>https://blog.flatcircle.ai/p/creative-llm-use-cases-needles-in</link><guid isPermaLink="false">https://blog.flatcircle.ai/p/creative-llm-use-cases-needles-in</guid><dc:creator><![CDATA[Jim Moran]]></dc:creator><pubDate>Fri, 03 Apr 2026 12:03:56 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!MhmW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03bfd77a-51b4-430d-82ae-9af1c2e03747_1600x976.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Flat Circle tracks creative use cases of LLMs in hedge funds. If you haven&#8217;t already, join hundreds of PMs, analysts and engineers reading each week:</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.flatcircle.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.flatcircle.ai/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h3>Which model works best for which workflow?</h3><p>LLM benchmarks score models against real world tasks designed by experts. These popular ones cover various investment research workflows: accounting (<a href="http://dualentry.com/accounting-ai-benchmark">DualEntry</a>), excel (<a href="https://spreadsheetbench.github.io/">SpreadsheetBench</a>), SEC filings (<a href="https://www.vals.ai/benchmarks/finance_agent">Vals</a>), finance reasoning (<a href="https://labs.scale.com/leaderboard/prbench-finance">PRBench</a>), event forecasting (<a href="https://www.predictionarena.ai/">Prediction Arena</a>), thought partnership (<a href="https://petergpt.github.io/bullshit-benchmark/">BullshitBench</a>) and OpenClaw (<a href="https://pinchbench.com/">PinchBench</a>):</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!MhmW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03bfd77a-51b4-430d-82ae-9af1c2e03747_1600x976.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!MhmW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03bfd77a-51b4-430d-82ae-9af1c2e03747_1600x976.png 424w, https://substackcdn.com/image/fetch/$s_!MhmW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03bfd77a-51b4-430d-82ae-9af1c2e03747_1600x976.png 848w, https://substackcdn.com/image/fetch/$s_!MhmW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03bfd77a-51b4-430d-82ae-9af1c2e03747_1600x976.png 1272w, https://substackcdn.com/image/fetch/$s_!MhmW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03bfd77a-51b4-430d-82ae-9af1c2e03747_1600x976.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!MhmW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03bfd77a-51b4-430d-82ae-9af1c2e03747_1600x976.png" width="1456" height="888" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/03bfd77a-51b4-430d-82ae-9af1c2e03747_1600x976.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:888,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Hedge Fund Relevant LLM Benchmarks&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Hedge Fund Relevant LLM Benchmarks" title="Hedge Fund Relevant LLM Benchmarks" srcset="https://substackcdn.com/image/fetch/$s_!MhmW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03bfd77a-51b4-430d-82ae-9af1c2e03747_1600x976.png 424w, https://substackcdn.com/image/fetch/$s_!MhmW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03bfd77a-51b4-430d-82ae-9af1c2e03747_1600x976.png 848w, https://substackcdn.com/image/fetch/$s_!MhmW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03bfd77a-51b4-430d-82ae-9af1c2e03747_1600x976.png 1272w, https://substackcdn.com/image/fetch/$s_!MhmW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F03bfd77a-51b4-430d-82ae-9af1c2e03747_1600x976.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>We&#8217;re developing investment research benchmarks for agentic systems (ie not just the models but the prompts, tools, etc), which can drive meaningfully different outcomes. If you&#8217;re interested in collaborating, please reach out.</em></p><h3>Creative LLM use cases</h3><p><strong>Prompt and methodology for identifying &#8216;moving targets&#8217; - flagging whenever management teams change what metrics they highlight</strong> (<a href="https://arxiv.org/pdf/2510.03195">arXiv.org</a>, <a href="https://www.ai-street.co/p/tracking-shifts-in-earnings-call">AI Street</a>)</p><p><strong>YC partner releases open source, AI-native email inbox</strong> (<a href="https://x.com/agupta/status/2038692501536559208">@agupta</a>, <a href="https://github.com/ankitvgupta/mail-app">GitHub</a>). Makes it easier to bring internal and external data into your email flow. Alternatively, new YC startup <a href="https://www.agentmail.to/">AgentMail</a> is an easy way to give your agents their own email account</p><p><strong>Coatue-backed long-only uses AI to condense overnight research into custom podcasts</strong> (<a href="https://www.advisorperspectives.com/articles/2026/03/27/meet-eve-ai-brain-behind-ex-coatue-traders-fund">Advisor Perspectives</a>)</p><blockquote><p><em>Eve also scours the disclosures of more than 13,000 companies; listens to podcasts; scrutinizes social media posts; summarizes the news; and, each morning, generates a podcast for Kishore to listen to while he drives to work.</em></p></blockquote><p><strong>BullshitBench: When using LLM as a thought partner, does it push back when the premise of your question is flawed?</strong> (<a href="https://petergpt.github.io/bullshit-benchmark/">Benchmark</a>, <a href="https://x.com/petergostev">@petergostev</a>, thanks to <a href="https://www.linkedin.com/posts/ainsworld_what-a-great-benchmark-detecting-the-point-share-7439742603891277824-aj5K/">Mark Ainsworth</a>). Claude Opus 4.6 lets only 2% of flawed questions through, while GPT 5.4 lets 16% through.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://petergpt.github.io/bullshit-benchmark/" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!F1HT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3e7ed8e-0ed5-4d57-8073-dfbf8ed631f9_2766x1340.png 424w, https://substackcdn.com/image/fetch/$s_!F1HT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3e7ed8e-0ed5-4d57-8073-dfbf8ed631f9_2766x1340.png 848w, https://substackcdn.com/image/fetch/$s_!F1HT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3e7ed8e-0ed5-4d57-8073-dfbf8ed631f9_2766x1340.png 1272w, https://substackcdn.com/image/fetch/$s_!F1HT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3e7ed8e-0ed5-4d57-8073-dfbf8ed631f9_2766x1340.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!F1HT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3e7ed8e-0ed5-4d57-8073-dfbf8ed631f9_2766x1340.png" width="1456" height="705" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f3e7ed8e-0ed5-4d57-8073-dfbf8ed631f9_2766x1340.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:705,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:382096,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://petergpt.github.io/bullshit-benchmark/&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.flatcircle.ai/i/192904005?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3e7ed8e-0ed5-4d57-8073-dfbf8ed631f9_2766x1340.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!F1HT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3e7ed8e-0ed5-4d57-8073-dfbf8ed631f9_2766x1340.png 424w, https://substackcdn.com/image/fetch/$s_!F1HT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3e7ed8e-0ed5-4d57-8073-dfbf8ed631f9_2766x1340.png 848w, https://substackcdn.com/image/fetch/$s_!F1HT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3e7ed8e-0ed5-4d57-8073-dfbf8ed631f9_2766x1340.png 1272w, https://substackcdn.com/image/fetch/$s_!F1HT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3e7ed8e-0ed5-4d57-8073-dfbf8ed631f9_2766x1340.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Reddit post on using Claude Code to analyze retailers using satellite data</strong> (<a href="https://www.reddit.com/r/ClaudeCode/comments/1rz4us9/i_used_claude_code_to_build_a_satellite_image/">r/ClaudeCode</a>)</p><p><strong>Former hedge fund PM shares prompt for Perplexity Computer to create a guidance credibility analysis</strong> (<a href="https://x.com/FundamentEdge/status/2036487922200035577?s=20">@FundamentalEdge</a>)</p><p><strong>AllianceBernstein AI head uses LLMs to fill in missing time series data</strong> (<a href="https://www.alliancebernstein.com/us/en-us/defined-contribution/insights/investment-insights/staying-grounded-reducing-ai-hallucinations.html">post</a>)</p><blockquote><p><em>Researchers frequently incorporate historical data into their analysis, and data may be unavailable for certain time periods&#8212;the dreaded broken time series. Rather than throw away the series, analysts can use AI models to create fill-in data that, in the human expert&#8217;s judgement, may be sensible given the context.</em></p></blockquote><p><strong>TMT investor shares favorite LLM use cases</strong> (<a href="https://x.com/lfg_cap/status/2034867251778920824">@lfg_cap</a>) </p><blockquote><p><em>Option scenario pricing <br>Portfolio optimiser<br>Factor / thematic correlation / analysis / alerter<br>Getting from 0 to 95% on new sub sector <br>Qualitative relative business quality analysis (based on structured questionnaires)<br>New ideas / needle in haystack (parsing through 1000s of emails and twitter messages) for differentiated / contrarian views on different industries / geopolitics etc</em></p></blockquote><p><strong>AlphaSense launches custom agents to run prompts on a schedule, and custom AI expert calls to automatically interview a panel of experts</strong> (<a href="https://www.alpha-sense.com/press/alphasense-scales-workflow-automation-in-financial-firms-and-enterprises/">press release</a>)</p><h3>Interviewing yourself</h3><p><em>Several good pieces this week about documenting your personal workflow into prompts and markdown files.</em></p><p><strong>Example prompt to launch voice interview session - turning open ended discussion into detailed instructions</strong> (<a href="https://www.bensbites.com/p/agents-should-interview-you">Ben&#8217;s Bites</a>)</p><p><strong>More interview prompts that extract context from yourself and make your agents more effective</strong> (<a href="https://x.com/Shpigford/status/2034213621299884395">@Shpigford</a>)</p><p><strong>Lawyer discusses how he embeds his own personal frameworks into skill files</strong> (<a href="https://x.com/zackbshapiro/status/2036791156915290271">@zackbshapiro</a>). He also says it&#8217;s impossible to infer someone&#8217;s process simply by looking at their outputs, says it needs to come directly from the person: </p><blockquote><p><em>I&#8217;ve had people try to reverse-engineer my Claude skills by studying my outputs, using AI to analyze what I produce and reconstruct the instructions that generated it. They never get close...what my skills actually contain is not a description of what the output should look like. It&#8217;s a detailed operating procedure for how the output gets created: decision trees, analytical frameworks, sequencing logic, edge-case handling, judgment calls about when to be aggressive and when to hold back. You can&#8217;t see any of that by studying the finished product...A finished contract shows you what a great lawyer decided. It doesn&#8217;t show you how she decided it, what she considered and rejected, or the order in which she worked through the issues. The process is invisible in the product. My skills encode the process.</em></p></blockquote><h3>Follow for more investor LLM workflows</h3><p><em>If you would like to discuss incorporating LLMs into your research process, reply to this email or reach out via <a href="https://x.com/jdmoran">X</a> or <a href="https://www.linkedin.com/in/jamesdmoran/">LinkedIn</a>.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.flatcircle.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.flatcircle.ai/subscribe?"><span>Subscribe now</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[Creative LLM use cases - Job post analysis, say-do scores, merger arb, expert network MCPs]]></title><description><![CDATA[Plus: Guides on Claude Code, Claude Cowork, Claude for Excel and more]]></description><link>https://blog.flatcircle.ai/p/creative-llm-use-cases-job-post-analysis</link><guid isPermaLink="false">https://blog.flatcircle.ai/p/creative-llm-use-cases-job-post-analysis</guid><dc:creator><![CDATA[Jim Moran]]></dc:creator><pubDate>Mon, 16 Mar 2026 16:31:22 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!FoT0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd57c95cc-b206-49a0-b664-610f6214b723_2666x1372.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3>GenAI job post analysis</h3><p>We&#8217;ve been collecting hedge fund GenAI job posts over the past month to identify creative LLM use cases, and thought we would analyze them to see what else we could learn:</p><p><strong>Funds with the most job listings</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FoT0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd57c95cc-b206-49a0-b664-610f6214b723_2666x1372.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FoT0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd57c95cc-b206-49a0-b664-610f6214b723_2666x1372.png 424w, https://substackcdn.com/image/fetch/$s_!FoT0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd57c95cc-b206-49a0-b664-610f6214b723_2666x1372.png 848w, https://substackcdn.com/image/fetch/$s_!FoT0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd57c95cc-b206-49a0-b664-610f6214b723_2666x1372.png 1272w, https://substackcdn.com/image/fetch/$s_!FoT0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd57c95cc-b206-49a0-b664-610f6214b723_2666x1372.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FoT0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd57c95cc-b206-49a0-b664-610f6214b723_2666x1372.png" width="1456" height="749" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d57c95cc-b206-49a0-b664-610f6214b723_2666x1372.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:749,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Firm counts chart&quot;,&quot;title&quot;:&quot;Firm counts chart&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Firm counts chart" title="Firm counts chart" srcset="https://substackcdn.com/image/fetch/$s_!FoT0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd57c95cc-b206-49a0-b664-610f6214b723_2666x1372.png 424w, https://substackcdn.com/image/fetch/$s_!FoT0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd57c95cc-b206-49a0-b664-610f6214b723_2666x1372.png 848w, https://substackcdn.com/image/fetch/$s_!FoT0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd57c95cc-b206-49a0-b664-610f6214b723_2666x1372.png 1272w, https://substackcdn.com/image/fetch/$s_!FoT0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd57c95cc-b206-49a0-b664-610f6214b723_2666x1372.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Most common technologies</strong></p><p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YM8Y!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F554c2826-fa3e-4449-8e10-5b4b25e42ecd_2666x1372.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YM8Y!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F554c2826-fa3e-4449-8e10-5b4b25e42ecd_2666x1372.png 424w, https://substackcdn.com/image/fetch/$s_!YM8Y!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F554c2826-fa3e-4449-8e10-5b4b25e42ecd_2666x1372.png 848w, https://substackcdn.com/image/fetch/$s_!YM8Y!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F554c2826-fa3e-4449-8e10-5b4b25e42ecd_2666x1372.png 1272w, https://substackcdn.com/image/fetch/$s_!YM8Y!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F554c2826-fa3e-4449-8e10-5b4b25e42ecd_2666x1372.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YM8Y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F554c2826-fa3e-4449-8e10-5b4b25e42ecd_2666x1372.png" width="1456" height="749" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/554c2826-fa3e-4449-8e10-5b4b25e42ecd_2666x1372.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:749,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Updated tech sections chart&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Updated tech sections chart" title="Updated tech sections chart" srcset="https://substackcdn.com/image/fetch/$s_!YM8Y!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F554c2826-fa3e-4449-8e10-5b4b25e42ecd_2666x1372.png 424w, https://substackcdn.com/image/fetch/$s_!YM8Y!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F554c2826-fa3e-4449-8e10-5b4b25e42ecd_2666x1372.png 848w, https://substackcdn.com/image/fetch/$s_!YM8Y!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F554c2826-fa3e-4449-8e10-5b4b25e42ecd_2666x1372.png 1272w, https://substackcdn.com/image/fetch/$s_!YM8Y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F554c2826-fa3e-4449-8e10-5b4b25e42ecd_2666x1372.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Key takeaways from the data</strong></p><ul><li><p>Average annual salary in $212.5K, with a range from $150K to $300K</p></li><li><p>Teams prefer vendors for models and storage, open source for everything else</p></li><li><p>OpenAI mentioned twice as much as Anthropic</p></li><li><p>AWS is the most popular hyperscaler</p></li><li><p>While Balyasny, Millennium, Point72 dominate hiring, we did not find evidence of open fundamental-focused GenAI roles at Citadel</p></li></ul><p>If you&#8217;d like the full dataset, please reach out</p><h3>Creative LLM use cases</h3><p><strong>Bloomberg Businessweek used Claude to review 1,500 hours of livestream footage of influencers playing Stake, a crypto gambling site, to reveal the company was rigging bets (<a href="https://www.bloomberg.com/features/2026-stake-drake-crypto-casino-adin-ross-gambling/">Bloomberg</a>, thanks to <a href="https://www.thediff.co/archive/longreads-open-thread-172/?ref=the-diff-newsletter">Byrne Hobart</a>)</strong></p><blockquote><p><em>Reporters used Anthropic&#8217;s Claude, a large language model, to analyze footage frame by frame and determine the balance, bet and games being played during livestreams</em></p></blockquote><p><strong>An investor vibe coded an LLM that identifies past CEO claims and whether they bore out into a &#8220;say-do&#8221; score for every management team</strong> (<a href="https://x.com/dahu7744/status/2033207935367590337">@dahu7744</a>)</p><p><strong>Good thread on the process of iterating with Claude Code until it can correctly generate excel models</strong> (<a href="https://x.com/thomasrice_au/status/2031832736911298729">@tomasrice_au</a>). Related, OpenAI launched ChatGPT for Excel (<a href="https://openai.com/index/chatgpt-for-excel/">OpenAI</a>)</p><p><strong>Guide to using Claude Code / Cowork for investment research by CEO of Daloopa</strong> (<a href="https://x.com/oneThomasli/status/2030512369172946970">@oneThomasli</a>)</p><p><strong>JPAM hiring data scientist to analyze sellside notes and news to identify trending and emerging themes</strong> (<a href="https://www.linkedin.com/jobs/view/asset-management-data-science-vp-at-jpmorganchase-4375176406/">LinkedIn</a>)</p><p><strong>Case study on Jefferies equity research internal alternative data LLM chat</strong> (<a href="https://www.databricks.com/blog/jefferies-modernizes-equity-research-scale-databricks-and-agentic-analytics">Databricks</a>)</p><blockquote><p><em>This multi-source response surfaces analytical angles that analysts may not have explicitly requested, enabling corroboration across independent sources.</em></p></blockquote><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.databricks.com/blog/jefferies-modernizes-equity-research-scale-databricks-and-agentic-analytics" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_Kd6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3187069-a848-437b-8674-c087f76a463d_840x250.png 424w, https://substackcdn.com/image/fetch/$s_!_Kd6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3187069-a848-437b-8674-c087f76a463d_840x250.png 848w, https://substackcdn.com/image/fetch/$s_!_Kd6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3187069-a848-437b-8674-c087f76a463d_840x250.png 1272w, https://substackcdn.com/image/fetch/$s_!_Kd6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3187069-a848-437b-8674-c087f76a463d_840x250.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_Kd6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3187069-a848-437b-8674-c087f76a463d_840x250.png" width="840" height="250" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d3187069-a848-437b-8674-c087f76a463d_840x250.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:250,&quot;width&quot;:840,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:107125,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://www.databricks.com/blog/jefferies-modernizes-equity-research-scale-databricks-and-agentic-analytics&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.flatcircle.ai/i/190658415?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3187069-a848-437b-8674-c087f76a463d_840x250.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!_Kd6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3187069-a848-437b-8674-c087f76a463d_840x250.png 424w, https://substackcdn.com/image/fetch/$s_!_Kd6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3187069-a848-437b-8674-c087f76a463d_840x250.png 848w, https://substackcdn.com/image/fetch/$s_!_Kd6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3187069-a848-437b-8674-c087f76a463d_840x250.png 1272w, https://substackcdn.com/image/fetch/$s_!_Kd6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3187069-a848-437b-8674-c087f76a463d_840x250.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Roundup of internal LLM tools at BAM, Citadel, Point72, etc</strong> (<a href="https://x.com/TheValueist/status/2029992969680404840">@TheValueist</a>)</p><h3>Merger arb</h3><p><strong>A merger-arb ETF manager describes his LLM system</strong> (<a href="https://x.com/JulianKlymochko/status/2031050830598869205">@JulianKlymochko</a>):</p><blockquote><p><em>For example, we have Equity Research Analyst agent that writes an initiating coverage report on the merger target. Next, we have our Legal M&amp;A Analyst agent, that summarizes merger agreements and proxy statements, and our Antitrust Analyst agent, that analyzes market shares along with the DOJ / FTC, EC, China SAMR, and other global regulators would view the deal, in addition to ascribing probabilities of antitrust clearance / 2nd requests / merger challenges.</em></p></blockquote><p><strong>How Balyasny built its merger arb bot </strong>(<a href="https://openai.com/index/balyasny-asset-management/">OpenAI</a>):</p><blockquote><p><em>early feedback from merger arbitrage teams revealed that agents needed to continuously re-evaluate deal probabilities as new filings or press releases came in. The Balyasny team quickly extended agent planning capabilities and tool access, replacing a slow, manual workflow with real-time probabilistic monitoring</em></p></blockquote><h3>Expert network MCP</h3><p><strong>Third Bridge launched an MCP for its transcript library</strong> (<a href="https://www.thirdbridge.com/en-us/about-us/media/perspectives/mcp-vs-traditional-expert-networks">press release</a>). AlphaSense/Tegus <a href="https://developer.alpha-sense.com/agent-api/mcp-server">offers one</a> as well but GLG and Guidepoint currently do not have public MCP or API endpoints. </p><p>My take: seems like all the major transcript libraries will offer MCP / API access soon. Demand for transcript content should meaningfully increase as agents aren&#8217;t limited by cognitive ability to process transcripts. However, I expect pricing to fall even more as the search cost across providers will go down. Networks will compete on their ability to access experts exclusively - and unique experts will benefit accordingly. I also wouldn&#8217;t be surprised to see consolidation. </p><h3><strong>Follow for more investor LLM workflows</strong></h3><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.flatcircle.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.flatcircle.ai/subscribe?"><span>Subscribe now</span></a></p><p><em>If you would like to discuss incorporating LLMs into your research process, reply to this email or reach out via <a href="https://x.com/jdmoran">X</a> or <a href="https://www.linkedin.com/in/jamesdmoran/">LinkedIn</a>.</em></p>]]></content:encoded></item><item><title><![CDATA[Models deliver Beta, Humans deliver Alpha]]></title><description><![CDATA[Plus: Howard Marks, vibecoding pixel trackers, monitoring foreign language media and Claude Cowork]]></description><link>https://blog.flatcircle.ai/p/models-deliver-beta-humans-deliver</link><guid isPermaLink="false">https://blog.flatcircle.ai/p/models-deliver-beta-humans-deliver</guid><dc:creator><![CDATA[Jim Moran]]></dc:creator><pubDate>Thu, 05 Mar 2026 15:26:28 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Ojp7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4990a35a-669f-4966-bdd0-bb88bd2a210e_1512x850.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Flat Circle tracks creative use cases of LLMs in hedge funds. If you haven&#8217;t already, join hundreds of PMs, analysts and engineers reading each week:</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.flatcircle.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.flatcircle.ai/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h3>Models deliver Beta, Humans deliver Alpha</h3><p><strong>Harvard study shows AI can predict 71% of mutual fund trading decisions, but the remaining 29% of trades generate the most alpha</strong> (<a href="https://www.bloomberg.com/opinion/newsletters/2026-02-24/ai-can-manage-your-mutual-fund">Matt Levine</a> via <a href="https://x.com/justinaknope/status/2026346390478004512">Justina Lee</a>)</p><p><strong>Howard Marks describes the human x-factor</strong> (<a href="https://www.oaktreecapital.com/insights/memo/ai-hurtles-ahead">Oaktree LP Letter</a>)</p><blockquote><p><em>Great investors&#8230;have to be strong exactly where Claude admits AI might be weakest: in dealing with novel developments where there&#8217;s not enough prior experience for dependable patterns to have been compiled (and learned by AI during its training). They also have to make subjective decisions regarding qualitative factors and exercise taste and discernment. For instance, choosing the right counterparties has played an important part in Oaktree&#8217;s success. How will AI make judgments of that sort? And there&#8217;s something else: AI doesn&#8217;t have skin in the game. It doesn&#8217;t feel the weight of concentrated positions or the fear of capital loss. Its willingness to take risk might not be constrained by humans&#8217; normal risk aversion. The best investors sense potential risk intuitively, and this contributes greatly to their success.</em></p><p><em>Especially when investors are dealing with new and untried products, CEOs, or industries, there can be few facts or analogous experiences, meaning we have to rely on &#8220;opinion or speculation.&#8221; Given the limitations discussed above on AI&#8217;s ability to tackle brand new situations, will its speculation about new things &#8211; as opposed to extrapolating historic patterns &#8211; be consistently superior to that of all humans? I believe there will continue to be human investors who are superior to AI, since I don&#8217;t think AI will be able to do an unbeatable job of these things.</em></p></blockquote><p>My take: LLMs level the playing field in processing public information and increase the reward for proprietary research, personal relationships and experience</p><h3>Creative LLM uses case</h3><p><strong>Hayden Capital vibecoded a pixel tracker for Applovin (APP)</strong> (<a href="https://content.haydencapital.com/Hayden-Capital-Quarterly-Letter-2025-Q4.pdf">LP Letter</a>) </p><blockquote><p><em>For example, I recently &#8220;vibe-coded&#8221; our own Applovin Axon Pixel Tracker, to track Applovin&#8217;s new ecommerce push (LINK). The program scans the top 100,000 ecommerce websites, and whether they&#8217;ve adopted Applovin&#8217;s ecommerce tools &#8211; useful for us tracking adoption in real time. I did this all with Claude Code, in just a couple hours over a weekend, and runs on Amazon&#8217;s AWS.</em></p></blockquote><p><strong>Norges Bank uses Claude to monitor foreign language media for ESG issues in their portfolio</strong> (<a href="https://www.cnbc.com/2026/02/26/norway-sovereign-wealth-fund-nbim-investment-ai-esg-claude.html">CNBC</a>)</p><blockquote><p><em>&#8220;Often, this information has not been captured in international media coverage or data vendor alerts&#8230;In multiple instances, we identified and sold these investments before the broader market reacted to the risks, avoiding potential losses.&#8221; NBIM said using AI this way had been particularly valuable for researching smaller companies in emerging markets, where news about the firm may be limited to small media outlets in local languages.</em></p></blockquote><p><strong>VC shares which of his workflows are mostly code vs mostly LLM driven</strong> (<a href="https://x.com/ttunguz/status/2027105187069059319">@ttungaz</a>)</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://x.com/ttunguz/status/2027105187069059319" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ojp7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4990a35a-669f-4966-bdd0-bb88bd2a210e_1512x850.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Ojp7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4990a35a-669f-4966-bdd0-bb88bd2a210e_1512x850.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Ojp7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4990a35a-669f-4966-bdd0-bb88bd2a210e_1512x850.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Ojp7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4990a35a-669f-4966-bdd0-bb88bd2a210e_1512x850.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ojp7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4990a35a-669f-4966-bdd0-bb88bd2a210e_1512x850.jpeg" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4990a35a-669f-4966-bdd0-bb88bd2a210e_1512x850.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://x.com/ttunguz/status/2027105187069059319&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Image" title="Image" srcset="https://substackcdn.com/image/fetch/$s_!Ojp7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4990a35a-669f-4966-bdd0-bb88bd2a210e_1512x850.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Ojp7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4990a35a-669f-4966-bdd0-bb88bd2a210e_1512x850.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Ojp7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4990a35a-669f-4966-bdd0-bb88bd2a210e_1512x850.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Ojp7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4990a35a-669f-4966-bdd0-bb88bd2a210e_1512x850.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>BlackRock and Schonfeld are hiring AI engineers for post-trade operations</strong> (<a href="https://www.linkedin.com/jobs/view/senior-ai-engineer-genai-ml-aladdin-engineering-vice-president-at-blackrock-4375829390/">BlackRock</a>, <a href="https://www.linkedin.com/jobs/view/2026-quantitative-developer-execution-services-sophomore-summer-internship-at-schonfeld-4375429840/">Schonfeld</a>)</p><p><strong>AWS Bedrock post on using their graphRAG workflow to analyze 10-K filings and identify shared risk relationships across the S&amp;P 100</strong> (<a href="https://aws.amazon.com/blogs/industries/agentic-graphrag-for-capital-markets/">AWS</a>)</p><p><strong>Rutgers finance professor shares 8 tips for using OpenAI Batch API (50% cheaper) for large scale transcript analysis</strong> (<a href="https://www.linkedin.com/posts/gatsby-zhang_openai-batchapi-llm-share-7433231811712884736-Hq5M/">LinkedIn</a>)</p><h3>Interesting paper</h3><p><strong>Compares LLM use in stock pitches on Seeking Alpha vs r/WallStreetBets.</strong> AI drives better returns in the former, more professional community, while on r/WallStreetBets AI drives abnormal trading and lottery outcomes. Interesting take on how LLMs may impact retail and institutional trading (<a href="https://www.nber.org/papers/w34807">NBER</a>)</p><h3>New tools</h3><p><strong>Review of Claude Cowork, a non-technical desktop version of Claude Code</strong> (<a href="https://www.buysideaireviews.com/p/claude-cowork-review-1">Buyside AI Reviews</a>). When asked to turn a lender presentation into a leveraged loan screening model, the tool made several key errors: </p><blockquote><p><em>I do think people underestimate (i) the amount of time it takes to check/correct output and (ii) the willingness of senior folks to actually do the checking. And given the black box nature of LLM reasoning, the checking needs may not scale down as fast as AI capabilities scale up.</em></p></blockquote><p>More: Claude launches Cowork and plugins for finance (<a href="https://claude.com/blog/cowork-plugins-finance">announcement</a>)</p><p><strong>Checkmate, another AI expert call service, launches</strong> (<a href="https://checkmateresearch.ai/">CheckmateResearch.ai</a>). Other services where LLMs source and conduct interviews include: <a href="https://www.alpha-sense.com/press/alphasense-launches-autonomous-ai-agent-interviewer-debuts-channel-checks-to-deliver-real-time-market-signals-across-all-sectors-of-the-economy/">AlphaSense</a>, <a href="https://www.guidepoint.com/what-we-offer/ai-moderation/">Guidepoint</a>, <a href="https://www.newtonx.com/article/ai-moderated-research/">NewtonX</a>, <a href="https://qualitate.io/">Qualitate</a>, <a href="https://www.ribbon.ai/">Ribbon</a>, <a href="https://www.synquery.ai/">Synquery</a>. Interesting reddit thread where experts debate the future of this format: &#8220;<a href="https://www.reddit.com/r/expertnetworks/comments/1r2i1a9/please_boycott_ai_mod_calls/">Please boycott ai mod calls</a>&#8221;</p><p><strong>FirstDraftResearch, which looks like a &#8220;cursor for public market investors,&#8221; announces private beta</strong> (<a href="https://x.com/atelicinvest/status/2026753156257034263?s=20">@atelicinvest</a>)</p><p><strong>Bloomberg launches &lt;ASKB&gt; - conversational AI interface</strong> (<a href="https://www.bloomberg.com/company/stories/meet-askb-bloomberg-introduces-agentic-ai-to-the-bloomberg-terminal/">Bloomberg</a>)</p><h3>Follow for more investor LLM workflows</h3><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.flatcircle.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.flatcircle.ai/subscribe?"><span>Subscribe now</span></a></p><p><em>If you would like to discuss incorporating LLMs into your research process, reply to this email or reach out via <a href="https://x.com/jdmoran">X</a> or <a href="https://www.linkedin.com/in/jamesdmoran/">LinkedIn</a>.</em></p>]]></content:encoded></item><item><title><![CDATA[Creative LLM workflows - OpenClaw]]></title><description><![CDATA[Plus 13Fs analyzer, alternative data processor and more prediction arenas]]></description><link>https://blog.flatcircle.ai/p/creative-llm-workflows-openclaw</link><guid isPermaLink="false">https://blog.flatcircle.ai/p/creative-llm-workflows-openclaw</guid><dc:creator><![CDATA[Jim Moran]]></dc:creator><pubDate>Mon, 23 Feb 2026 13:03:20 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!EqX-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf1fc57f-c1c3-4af3-b2d8-c18271f5a400_1158x930.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Flat Circle tracks creative use cases of LLMs in hedge funds. If you haven&#8217;t already, join hundreds of PMs, analysts and engineers reading each week:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.flatcircle.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.flatcircle.ai/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h3>OpenClaw</h3><p><em>OpenClaw (fka ClawdBot, Moltbot) allows a computer to run an LLM system in an &#8220;always-on&#8221; way and interact with almost anything - making it feel more like a proactive analyst that you can train and work with via slack, email, signal, etc.</em></p><p><strong>How to build an OpenClaw investment research analyst</strong> (<a href="https://saulius.io/blog/openclaw-ai-investment-analyst-automated-financial-research">Saulius</a>) </p><blockquote><p><em>OpenClaw changes the equation. It is an open-source, self-hosted AI agent platform that runs persistently on your machine, connects to every messaging platform you use, and has access to a full suite of tools -- file operations, web search, browser control, code execution, and long-term memory. It does not just answer questions. It reads research reports, builds financial models, monitors markets around the clock, learns from its own experience, and proactively alerts you when something needs your attention.</em></p></blockquote><p><strong>Institutional Investor warns funds not to build their own OpenClaw</strong> (<a href="https://www.institutionalinvestor.com/article/openclaw-ai-agent-institutional-investors-need-understand-shouldnt-touch">Institutional Investor</a>)</p><p><strong>Data scientist at real estate asset manager analyzes 120 data center projects by talking to OpenClaw via WhatsApp</strong> (<a href="https://infrastructureresearch.substack.com/p/no-news-is-bad-news">Infrastructure Research</a>)</p><p><strong>Two funds already hiring engineers to build with OpenClaw </strong>(<a href="https://apply.workable.com/moreton-capital-partners/j/B09073E471/">job post</a>, <a href="https://www.linkedin.com/jobs/view/quantitative-security-engineer-reverse-engineering-systems-automation-at-summit-peak-advisors-4371942925/">job post</a>)</p><p><strong>Upwork post by small event driven fund requesting an OpenClaw screening system</strong> (<a href="https://www.upwork.com/freelance-jobs/apply/Agent-evaluate-trading-opportunities-for-hedge-fund_~022020884626293722124/">job post</a>)</p><p><em>We are actively exploring how to safely use OpenClaw for investor workflows, please reply to this email to discuss.</em></p><h3>Creative LLM workflows</h3><p><strong>Founder of AI native hedge fund details how he builds with Claude Code</strong> (<a href="https://x.com/thomasrice_au/status/2022819861022544334">@thomasrice_au</a>)</p><blockquote><p><em>If it's front end or something we interact with, I start by describing what I need to do and initial ideas for interface. I'll then generate 30 mockups (10 each from GPT, Claude, Kimi), asking them to make each quite different, and to make each one an isolated html file with embedded JS and CSS. I'll then go through the 30, dismiss most, keep a few, then keep iterating until it feels right for what I want it to do.</em></p></blockquote><p><strong>LLM workflow to analyze new 13F holdings and generate a list of relevant positions and reason each fund likely owns them</strong> (<a href="https://x.com/FundamentEdge/status/2024198857689866367?s=20">@FundamentalEdge</a>)</p><p><strong>Bonus: New feature from Polymarket allows anyone to pledge rewards to encourage more research</strong> (<a href="https://x.com/Polymarket/status/2023868714794578408">@polymarket</a> thanks to <a href="https://x.com/adrien_nav">@adrien_nav</a>). Currently the markets with the largest sponsored rewards are for whether the US strikes Iran, Fed decisions and the S&amp;P.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://x.com/Polymarket/status/2023868714794578408" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!EqX-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf1fc57f-c1c3-4af3-b2d8-c18271f5a400_1158x930.png 424w, https://substackcdn.com/image/fetch/$s_!EqX-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf1fc57f-c1c3-4af3-b2d8-c18271f5a400_1158x930.png 848w, https://substackcdn.com/image/fetch/$s_!EqX-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf1fc57f-c1c3-4af3-b2d8-c18271f5a400_1158x930.png 1272w, https://substackcdn.com/image/fetch/$s_!EqX-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf1fc57f-c1c3-4af3-b2d8-c18271f5a400_1158x930.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!EqX-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf1fc57f-c1c3-4af3-b2d8-c18271f5a400_1158x930.png" width="1158" height="930" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cf1fc57f-c1c3-4af3-b2d8-c18271f5a400_1158x930.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:930,&quot;width&quot;:1158,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:245978,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://x.com/Polymarket/status/2023868714794578408&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.flatcircle.ai/i/188724592?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf1fc57f-c1c3-4af3-b2d8-c18271f5a400_1158x930.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!EqX-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf1fc57f-c1c3-4af3-b2d8-c18271f5a400_1158x930.png 424w, https://substackcdn.com/image/fetch/$s_!EqX-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf1fc57f-c1c3-4af3-b2d8-c18271f5a400_1158x930.png 848w, https://substackcdn.com/image/fetch/$s_!EqX-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf1fc57f-c1c3-4af3-b2d8-c18271f5a400_1158x930.png 1272w, https://substackcdn.com/image/fetch/$s_!EqX-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf1fc57f-c1c3-4af3-b2d8-c18271f5a400_1158x930.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>Interesting jobs</h3><p><strong>Point72 hiring GenAI engineer focused on alternative data</strong> (<a href="https://careers.point72.com/CSJobDetail?jobName=data-scientist-proprietary-research&amp;jobCode=PIT-0013870">job post</a>)</p><h3>Updates from trading arenas</h3><p><strong>In Prediction Arena, all models turned negative this week </strong>(<a href="https://www.predictionarena.ai/">predictionarena.ai</a>)</p><p>My take: it appears the models made a few concentrated bets that were impacted by extreme weather and a surprising unemployment report. Like a lot of the other trading models, they often win for a period but are actually selling vol. Read more on AI trading arenas <a href="https://blog.flatcircle.ai/p/ai-trading-arenas">here</a>.</p><p><strong>Research paper outlines findings from nine-month LLM driven trading strategy</strong>. Findings include a 2.43 sharpe and skill at identifying longs but not shorts (<a href="https://arxiv.org/pdf/2601.11958">arXiv.org</a>)</p><p><strong>Chamath Palihapitiya publishes paid post covering trading arenas</strong> (<a href="https://chamath.substack.com/p/autonomous-investing">Substack</a>)</p><h3><strong>Follow for more investor LLM workflows</strong></h3><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.flatcircle.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.flatcircle.ai/subscribe?"><span>Subscribe now</span></a></p><p><em>If you would like to discuss incorporating LLMs or OpenClaw into your research process, reply to this email or reach out via <a href="https://x.com/jdmoran">X</a> or <a href="https://www.linkedin.com/in/jamesdmoran/">LinkedIn</a>.</em></p>]]></content:encoded></item><item><title><![CDATA[Creative LLM use cases - Dan Loeb, Baupost, Altimeter, YCombinator, Ralph Wiggum]]></title><description><![CDATA[Plus: 8 new workflows, 3 new tools and every AI trading arena]]></description><link>https://blog.flatcircle.ai/p/creative-llm-use-cases-dan-loeb-baupost</link><guid isPermaLink="false">https://blog.flatcircle.ai/p/creative-llm-use-cases-dan-loeb-baupost</guid><dc:creator><![CDATA[Jim Moran]]></dc:creator><pubDate>Wed, 11 Feb 2026 13:02:55 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!PwQT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa761cc9-4a10-4384-a627-deb7b35ee1dd_1306x622.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Flat Circle tracks creative use cases of LLMs in hedge funds. If you haven&#8217;t already, join hundreds of PMs, analysts and engineers reading each week:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.flatcircle.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.flatcircle.ai/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h3>Investor LLM workflows</h3><p><strong>Dan Loeb asked Claude which public companies it might disrupt (<a href="https://assets-malibu-life.s3.us-west-2.amazonaws.com/system/uploads/fae/file/asset/1689/Third_Point_Q4_2025_Investor_Letter_TPIL.pdf">Third Point 4Q25 LP Letter</a>)</strong></p><blockquote><p><em>A simple query into Claude&#8217;s chatbot: &#8220;Which companies is Anthropic capable of dislocating or disrupting?&#8221; yields some fascinating results and was in our view a fruitful source of hedges for our firm.</em></p></blockquote><p>My take: the subtext here is that models are evolving into the source of conventional wisdom. </p><p><strong>AI driven short seller Abelian Analysis analyzed hundreds of Youtube transcripts to assess the pricing environment for CVNA&#8217;s used vehicles (<a href="https://abeliananalysis.com/posts/carvana-short-thesis/#youtube-sentiment-building-a-signal-from-noise">Short Report</a>, <a href="https://github.com/Abelian-Analysis/Sentiment-Analysis">Github</a>)</strong></p><blockquote><p><em>Each transcript was analyzed by Claude Sonnet 4 using a prompt designed to separate market conditions from creator mood. This distinction is critical. A dealer complaining about thin margins is telling you margins are compressed &#8212; a bearish market signal. A flipper excited about &#8220;deals everywhere&#8221; is telling you inventory is high and prices are soft &#8212; also bearish. The LLM was instructed to ignore emotional spin and extract the underlying market reality across three categorical signals (inventory direction, demand strength, repossession activity), two continuous scores (bullish 0-100, bearish 0-100), and a sensationalism rating (1-10) that we use for quality control.</em></p></blockquote><p><strong>Investor from Altimeter Capital outlines two LLM workflows, including a PDF example from a &#8220;council of LLMs that rigorously debate topics with access to web search&#8221; (<a href="https://x.com/_clarktang/status/2019904276429172955?s=20">@_clarktang</a>, thank you <a href="https://x.com/realLigerCub">@realLigerCub</a>)</strong></p><p><strong>An RIA PM shares his deep research prompt (<a href="https://x.com/TedHZhang/status/2020502698995405097?s=20">@TedHZhang</a>)</strong></p><p><strong>AI-native hedge fund, Minotaur Capital, used &#8220;Ralph Wiggum&#8221; style iterative research loop to determine that gaming stocks were oversold following the Genie 3 release (<a href="https://www.minotaurcapital.com/reports/monthly/2026-01">Minotaur January Letter</a>)</strong></p><blockquote><p><em>We immediately spun up a research process using the iterative techniques we described in our <a href="https://www.minotaurcapital.com/reports/quarterly/2025-12">December Quarterly</a>. From a 127-word prompt asking for implications on the games industry, our AI system iteratively chose what to explore: value chain analysis, five-year scenarios with falsifiable signposts, unit economics ($/minute cost models), IP and licensing questions, and a winners/losers matrix across engines, platforms, and publishers. Over 50 iterations it built out each section, cited sources, and stress-tested its own conclusions.</em></p></blockquote><p><strong>Former Baupost investor Dave Plon shares an AI workflow around killing ideas faster: develop a list of non-negotiables (e.g., CEO compensation structure, guidance track record), and have the system eliminate every name in your coverage failing those non-negotiables. (<a href="https://colossus.com/episode/how-investors-are-using-ai/">Business Breakdowns</a>, 12m 50s)</strong></p><p><strong>Former Maverick / DE Shaw / Citadel PM shares his prompt to analyze the technical setup (<a href="https://x.com/FundamentEdge/status/2019108791518917017">@FundamentalEdge</a>, thank you <a href="https://x.com/realLigerCub">@realLigerCub</a>)</strong></p><p><strong>New paper analyzes the impact of LLM tools like ChatGPT on price reactions during earnings calls. Stocks don&#8217;t react faster, but at a greater magnitude after a delay due to model latency and transcription availability (<a href="https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6105146">Price Discovery Within Earnings Calls</a>, thank you <a href="https://x.com/justinaknope">Justina Lee</a>)</strong></p><h3>Cons to LLM investor workflows</h3><p><strong>An investor argues LLMs make it harder to build conviction (<a href="https://x.com/evrgn11112231/status/2020138476188889490">@evrgn11112231</a>)</strong></p><blockquote><p><em>I view investment research as akin to the slow LLM training process. It&#8217;s not supposed to be fast. The goal is to ingest raw data over long periods of time to train your brain (the ultimate LLM) for instant recall and pattern matching later.</em></p></blockquote><p><strong>Former credit investor warns on using LLMs for initiation style reports, as they often miss key events that would materially alter the narrative (<a href="https://x.com/BuysideAIreview/status/2019835540435046443">@BuysideAIReview</a>). </strong>My take: the quality of an LLM workflow is only as good as its eval. Before asking for a deep dive, first build a list of key events, transactions, players, etc - then run deep research as a loop until it hits all the required items. See the &#8220;Ralph Wiggum&#8221; discussion above.</p><h3>Interesting LLM tools</h3><p><strong>Former mega fund and credit hedge fund investor reviewing every buyside AI tool (<a href="https://www.buysideaireviews.com/">Buyside AI Reviews</a>)</strong></p><p><strong>Former hedge fund PM previews an insights feed built from earnings call transcripts (<a href="https://x.com/atelicinvest/status/2020343503411401157">@atelicinvest</a>)</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://x.com/atelicinvest/status/2020343503411401157" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PwQT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa761cc9-4a10-4384-a627-deb7b35ee1dd_1306x622.jpeg 424w, https://substackcdn.com/image/fetch/$s_!PwQT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa761cc9-4a10-4384-a627-deb7b35ee1dd_1306x622.jpeg 848w, https://substackcdn.com/image/fetch/$s_!PwQT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa761cc9-4a10-4384-a627-deb7b35ee1dd_1306x622.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!PwQT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa761cc9-4a10-4384-a627-deb7b35ee1dd_1306x622.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PwQT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa761cc9-4a10-4384-a627-deb7b35ee1dd_1306x622.jpeg" width="1306" height="622" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/aa761cc9-4a10-4384-a627-deb7b35ee1dd_1306x622.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:622,&quot;width&quot;:1306,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://x.com/atelicinvest/status/2020343503411401157&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Image" title="Image" srcset="https://substackcdn.com/image/fetch/$s_!PwQT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa761cc9-4a10-4384-a627-deb7b35ee1dd_1306x622.jpeg 424w, https://substackcdn.com/image/fetch/$s_!PwQT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa761cc9-4a10-4384-a627-deb7b35ee1dd_1306x622.jpeg 848w, https://substackcdn.com/image/fetch/$s_!PwQT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa761cc9-4a10-4384-a627-deb7b35ee1dd_1306x622.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!PwQT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa761cc9-4a10-4384-a627-deb7b35ee1dd_1306x622.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>New tool benchmarks the stock impact of short reports / short selling firms (<a href="https://shortreportimpact.com/">ShortReportImpact</a>)</strong></p><h3>YCombinator Request for Startup: AI-Native Hedge Funds</h3><p>YCombinator just published its latest <a href="https://www.ycombinator.com/rfs#ai-native-hedge-funds">Request for Startups</a>:</p><blockquote><p><em>&#8230;the next Renaissance, Bridgewater, and D.E. Shaw's are going to be built on AI. The biggest funds in the world have been slow to adapt. I worked as a quant researcher at one of these funds, and when I asked compliance to let us use ChatGPT, I didn't even get a response. It made it clear to me that the hedge funds of the future won't just bolt AI onto their existing strategies. They'll use it to come up with entirely new ones. That's where the alpha is.</em></p></blockquote><h3>Now tracking every AI trading arena</h3><p>AI trading arenas are public experiments where LLMs perform research and trade in a live environment. They are one way to track LLM progress in making investment decisions. </p><p>Our new page tracking every public arena is here: <strong><a href="https://blog.flatcircle.ai/p/ai-trading-arenas">AI Trading Arenas</a></strong></p><p>Key takeaways include: (i) the median model always loses money, (ii) newer frontier models outperform the older models, (iii) soon-to-be-released Grok 4.2 is undefeated, (iv) Claude has not yet won any trading arena.</p><h3><strong>Follow for more investor LLM workflows</strong></h3><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.flatcircle.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.flatcircle.ai/subscribe?"><span>Subscribe now</span></a></p><p><em>If you would like to discuss incorporating LLMs into your research process, reply to this email or reach out via <a href="https://x.com/jdmoran">X</a> or <a href="https://www.linkedin.com/in/jamesdmoran/">LinkedIn</a>.</em></p>]]></content:encoded></item><item><title><![CDATA[Creative LLM use cases - Illusion of competence, management exploits and other vulnerabilities]]></title><description><![CDATA[Plus: GraphRAG vs RAG explained, update on trading arenas and four new workflows]]></description><link>https://blog.flatcircle.ai/p/creative-llm-use-cases-illusion-of</link><guid isPermaLink="false">https://blog.flatcircle.ai/p/creative-llm-use-cases-illusion-of</guid><dc:creator><![CDATA[Jim Moran]]></dc:creator><pubDate>Tue, 03 Feb 2026 19:54:06 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!lmwM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbf6097c-749b-4467-b76b-0c438624bb9e_850x616.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Flat Circle tracks creative use cases of LLMs in hedge funds. If you haven&#8217;t already, join hundreds of PMs, analysts and engineers reading each week:</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.flatcircle.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.flatcircle.ai/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h3>Vulnerabilities in LLM investment research</h3><p><strong>AllianceBernstein Chief AI Officer warns about management <a href="https://www.pwmnet.com/content/e4abbfcd-237b-43ec-9044-cf2f8f46ba69">altering phrasing to exploit LLMs earnings calls summarizers</a>:</strong> </p><blockquote><p><em>&#8220;Companies know we are measuring sentiment, so they adjust. They&#8217;ve started using more positive words, even with bad news.&#8221; That forces investors to evolve. &#8220;If I focus on the prepared remarks, sentiment scores are high. But in the Q&amp;A, it&#8217;s much harder to control. That gives you a better read. </em>&#8220;It&#8217;s a cat-and-mouse game,&#8221; he adds. &#8220;You have to keep improving to continue to generate alpha.&#8221;</p></blockquote><p><strong>New research on <a href="https://www.arxiv.org/pdf/2601.13082">hidden text attacks in automated trading systems</a>. </strong>A vulnerability when your LLM web search agent discovers text hidden from human readers:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.arxiv.org/pdf/2601.13082" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lmwM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbf6097c-749b-4467-b76b-0c438624bb9e_850x616.png 424w, https://substackcdn.com/image/fetch/$s_!lmwM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbf6097c-749b-4467-b76b-0c438624bb9e_850x616.png 848w, https://substackcdn.com/image/fetch/$s_!lmwM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbf6097c-749b-4467-b76b-0c438624bb9e_850x616.png 1272w, https://substackcdn.com/image/fetch/$s_!lmwM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbf6097c-749b-4467-b76b-0c438624bb9e_850x616.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lmwM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbf6097c-749b-4467-b76b-0c438624bb9e_850x616.png" width="850" height="616" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cbf6097c-749b-4467-b76b-0c438624bb9e_850x616.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:616,&quot;width&quot;:850,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:289304,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://www.arxiv.org/pdf/2601.13082&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.flatcircle.ai/i/186747399?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbf6097c-749b-4467-b76b-0c438624bb9e_850x616.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!lmwM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbf6097c-749b-4467-b76b-0c438624bb9e_850x616.png 424w, https://substackcdn.com/image/fetch/$s_!lmwM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbf6097c-749b-4467-b76b-0c438624bb9e_850x616.png 848w, https://substackcdn.com/image/fetch/$s_!lmwM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbf6097c-749b-4467-b76b-0c438624bb9e_850x616.png 1272w, https://substackcdn.com/image/fetch/$s_!lmwM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbf6097c-749b-4467-b76b-0c438624bb9e_850x616.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>My take: As investors increasingly leverage LLMs, the market will respond with new attempts to manipulate them. Point72 just posted a role for a <a href="https://job-boards.greenhouse.io/point72/jobs/8399360002">GenAI Security Engineer</a>. <strong>We are experimenting with a few approaches here, so if you&#8217;re interested in this problem, please reach out.</strong></p><h3>Another LLM risk: Illusion of competence</h3><p>Two LP letters issue similar warnings on the increasing use of LLMs in investment research:</p><p><a href="https://okeefestevens.com/quarterly-investor-letter-q4-2025/">O&#8217;Keefe Steven - 4Q 2025 Investor Letter</a>: </p><blockquote><p><em>The more concerning dynamic is the illusion of competence. There is a risk that access to more contextually rich output leads to overconfidence in areas where the user lacks actual domain expertise. Nowhere is this more dangerous than in highly regulated, technically complex industries like healthcare or energy, where surface understanding is insufficient for investment decision-making. We expect many market participants to expand into unfamiliar sectors with misplaced confidence, armed with tools that enhance comprehension but not judgment</em></p></blockquote><p><a href="https://imonkey-files.s3-us-west-1.amazonaws.com/4Q25-Upslope%20Capital%20Management.pdf">Upslope Capital - 4Q 2025 Investor Letter</a></p><blockquote><p><em>AI is also everywhere &#8211; particularly on the desktops of buyside analysts and PMs. I suspect this technological shift is part of a not-so-virtuous cycle with the cultural shift towards gambling. A couple years ago legendary investor Stan Druckenmiller noted how he made a quick bet on Argentinian stocks with an assist from AI: &#8216;&#8230;do you want to hear how I invested in Argentina? It&#8217;s a funny story&#8230;I saw the speech in Davos and it was about 1:00 in the afternoon in my office. I dialed up Perplexity [AI] and I said, give me the five most liquid ADRs in Argentina&#8230;It gave me enough of a description that I follow the old Soros rule, invest and then investigate. I bought all of them. We did some work on them. I increased my positions and so far, it&#8217;s been great.&#8217;</em></p></blockquote><h3>More investor workflows</h3><p><strong>TMT / Energy investor lays out how he orchestrates sub-agents using the &#8220;Great Architect&#8221; framework (<a href="https://x.com/TheValueist/status/2017975267713589430?s=20">@TheValueist</a>)</strong></p><p><strong>Head of AI at Manulife shares strategies to drive internal adoption (<a href="https://www.ai-street.co/p/inside-manulifes-early-ai-adoption">AI Street</a>)</strong></p><p><strong>Former Capital Group partner and founder of new LLM driven hedge fund uses LLMs to analyze 2026 outlooks from the top asset managers (<a href="https://www.linkedin.com/feed/update/urn:li:activity:7418004045451382784/">Linkedin</a>)</strong></p><blockquote><p><em>We used a language model to extract thousands of individual statements and organize them by topic and time horizon. Similar ideas were grouped and weighted by how often they appeared across independent firms&#8230;On the environment, there was broad agreement&#8230;.Firms disagreed on where returns are most likely to come from, how durable US market leadership will be, the timing and impact of policy easing, and how investable AI is at current valuations. These differences were not about facts. They reflected judgment&#8230;</em></p></blockquote><p><strong>Slides from new NYU Stern course on AI in Finance (<a href="https://arpitrage.substack.com/p/1-three-rules-for-ai-in-finance">Substack</a>, <a href="https://github.com/arpitrage/ai-in-finance">Slides</a>)</strong></p><h3>RAG vs GraphRAG</h3><p>The platform funds are building AI teams to develop workflows for their PMs and analysts. One of the top required skills for these teams is advanced retrieval augmented generation (RAG) techniques such as <a href="https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/">GraphRAG</a> - an open sourced framework developed by Microsoft Research. Example job post: E.g.,</p><p><strong><a href="https://www.linkedin.com/jobs/view/senior-genai-engineer-%E2%80%93-advanced-rag-at-millennium-4330058372/">Millennium: Senior GenAI Engineer - Advanced Rag</a></strong></p><blockquote><p><em>Enrichments and Knowledge Graph Construction: Move beyond flat vector search by building GraphRAG systems and advanced annotations such topics, keywords, sentiment, etc. You will extract entities (Companies, People, Metrics) and relationships from text to build a dynamic Knowledge Graph that captures the nuance of the financial markets and its temporal aspects.</em></p></blockquote><p>Basic vector RAG, which you would experience by attaching files to ChatGPT or NotebookLM, searches documents for relevant excerpts and simply attaches them as context to your prompt. GraphRAG indexes your corpus into entities and relationships then uses that structure to synthesize answers that aren&#8217;t obvious from any single chunk. Funds are hiring for this knowledge graph retrieval frameworks since they see their edge buried in writeups, interviews, surveys, notes and other longform text and want to maximize second-level insights. </p><h3>Update on trading arenas</h3><p>Soon to be released Grok 4.2 is the only model making money trading weather prediction markets (<a href="https://www.predictionarena.ai/">PredictionArena.ai</a>) </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.predictionarena.ai/" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-n9q!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba6cef10-eb85-42f7-821d-7663b61abe94_1810x1204.png 424w, https://substackcdn.com/image/fetch/$s_!-n9q!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba6cef10-eb85-42f7-821d-7663b61abe94_1810x1204.png 848w, https://substackcdn.com/image/fetch/$s_!-n9q!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba6cef10-eb85-42f7-821d-7663b61abe94_1810x1204.png 1272w, https://substackcdn.com/image/fetch/$s_!-n9q!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba6cef10-eb85-42f7-821d-7663b61abe94_1810x1204.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-n9q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba6cef10-eb85-42f7-821d-7663b61abe94_1810x1204.png" width="1456" height="969" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ba6cef10-eb85-42f7-821d-7663b61abe94_1810x1204.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:969,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:246883,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://www.predictionarena.ai/&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.flatcircle.ai/i/186747399?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba6cef10-eb85-42f7-821d-7663b61abe94_1810x1204.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-n9q!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba6cef10-eb85-42f7-821d-7663b61abe94_1810x1204.png 424w, https://substackcdn.com/image/fetch/$s_!-n9q!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba6cef10-eb85-42f7-821d-7663b61abe94_1810x1204.png 848w, https://substackcdn.com/image/fetch/$s_!-n9q!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba6cef10-eb85-42f7-821d-7663b61abe94_1810x1204.png 1272w, https://substackcdn.com/image/fetch/$s_!-n9q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba6cef10-eb85-42f7-821d-7663b61abe94_1810x1204.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In another arena featuring Grok 4.2, <a href="https://nof1.ai/">Alpha Arena</a>, xAI&#8217;s forthcoming model won as well. We cover trading arenas in more detail in <a href="https://blog.flatcircle.ai/p/can-llms-make-investment-decisions">an earlier post</a>.</p><h3><strong>Follow for more investor LLM workflows</strong></h3><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.flatcircle.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.flatcircle.ai/subscribe?"><span>Subscribe now</span></a></p><p><em>If you would like to discuss incorporating LLMs into your research process, reply to this email or reach out via <a href="https://x.com/jdmoran">X</a> or <a href="https://www.linkedin.com/in/jamesdmoran/">LinkedIn</a>.</em></p>]]></content:encoded></item><item><title><![CDATA[Creative LLM use cases - Prompts for evasion, switching costs, risk arb, and hidden real estate value]]></title><description><![CDATA[Plus: updates on AI expert network interviews, LLM trading arenas and more]]></description><link>https://blog.flatcircle.ai/p/creative-llm-use-cases-prompts-for</link><guid isPermaLink="false">https://blog.flatcircle.ai/p/creative-llm-use-cases-prompts-for</guid><dc:creator><![CDATA[Jim Moran]]></dc:creator><pubDate>Mon, 26 Jan 2026 14:57:49 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!1xI6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81e5281f-bf5e-4010-b520-6bf759f9ddbc_1166x646.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Flat Circle tracks creative use cases of LLMs in hedge funds. If you haven&#8217;t already, join hundreds of PMs, analysts and engineers reading each week:</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.flatcircle.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.flatcircle.ai/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h3>Detecting evasion</h3><p><strong>Generation IM ($21B AUM) shares a couple workflows in <a href="https://www.generationim.com/media/wfanbyn4/generation-im-q4-2025-global-equity-investor-letter.pdf">their 4Q25 letter</a></strong>:</p><blockquote><p><em>&#8220;Our &#8216;First Looks&#8217; initiative serves a different function. When analysts evaluate a new company, we leverage AI to provide a snapshot overview with green, yellow and red flags drawn from sources like Glassdoor reviews, which used to take hours of manual work. Finally, our &#8216;Deception Detection&#8217; dashboard analyses earnings call transcripts across the portfolio, flagging watchlist topics and potential areas for forensic accounting review&#8221;</em></p></blockquote><p><strong>New paper features prompt and system design for identifying evasion</strong>: <a href="https://arxiv.org/pdf/2601.09142">EvasionBench: Detecting evasive answers in financial Q&amp;A via multi-model consensus and LLM-as-Judge</a>. The key is to score each response on the degree to which management (i) answers the specific question, (ii) introduces irrelevant framing, (iii) relies on generalities, and (iv) deflects. Their eval is a set of 1,000 human annotated scores. Prompt is on page 12.</p><h3>Automated expert network interviews</h3><p>A couple things this week:</p><ul><li><p>Tech investor shares his experience with <a href="https://x.com/TechFundies/status/2014393459181068379">AlphaSense&#8217;s new AI-led interview product</a>, with example outputs</p></li><li><p>Former Chief Data Scientist at Third Point covers Ribbon.ai&#8217;s new expert-network-in-a-box: <a href="https://www.mattober.co/p/biggest-disruption-to-expert-networks-since-tegus">The biggest disruption to expert networks since Tegus</a></p></li></ul><blockquote><p><em>You can now source 1B+ experts leveraging their tool and then instantly book an expert call leveraging their voice AI, which then generates a transcript, which you can then leverage all their tools to analyze the transcript. All of this is white-label and available via API.</em></p></blockquote><ul><li><p>Expert networks utilizing AI interviews include: <a href="https://www.alpha-sense.com/">AlphaSense</a>, <a href="https://expertinsights.com/">Expert Insights</a>, <a href="https://www.guidepoint.com/what-we-offer/ai-moderation/">Guidepoint</a>, <a href="https://qualitate.io/">Qualitate</a>, <a href="https://www.ribbon.ai/">Ribbon.ai</a>, <a href="https://www.synquery.ai/">Synquery</a> (Email me any I&#8217;m missing, and I&#8217;ll compile and circulate a longer list) </p></li><li><p>I&#8217;ve also heard of multiple funds building this internally</p></li></ul><p>My take: I&#8217;ve been <a href="https://blog.flatcircle.ai/i/182452992/expert-interviews-again">skeptical of AI-led interviews</a> because I worried they&#8217;d overindex to lower quality &#8220;professional experts&#8221; and they wouldn&#8217;t create the chemistry eliciting deeper conversation. While definitely a limitation, I underappreciated how much better this experience is for the expert: an LLM is available around your schedule, they&#8217;re never rude to you or cancel on you. And there&#8217;s no chitchat. Also, investors don&#8217;t do more interviews because they&#8217;re constrained on time/mental-energy not so much on money. LLM driven interviews may dramatically increase both the supply and demand of expert interviews this year.</p><h3>Software switching costs</h3><p><strong>Former long/short PM shares his ChatGPT thread on finding low mission critical software <a href="https://x.com/atelicinvest/status/2013347902534623656">(@atelicinvest</a>):</strong></p><blockquote><p><em>Prompt: Help me build a first-principles framework to identify low mission-critical software by analyzing integration depth, switching costs, compliance exposure, workflow impact, and behavioral indicators (e.g., low engagement, discounting, promotions), without relying on retention metrics.</em></p></blockquote><h3>More investor workflows</h3><p><strong>Former Healthcare PM vibecodes datasets his internal datateam would have never prioritized</strong> (<a href="https://x.com/FundamentEdge/status/2015637398529671507">@FundamentalEdge</a>)</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://x.com/FundamentEdge/status/2015637398529671507" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1xI6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81e5281f-bf5e-4010-b520-6bf759f9ddbc_1166x646.png 424w, https://substackcdn.com/image/fetch/$s_!1xI6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81e5281f-bf5e-4010-b520-6bf759f9ddbc_1166x646.png 848w, https://substackcdn.com/image/fetch/$s_!1xI6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81e5281f-bf5e-4010-b520-6bf759f9ddbc_1166x646.png 1272w, https://substackcdn.com/image/fetch/$s_!1xI6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81e5281f-bf5e-4010-b520-6bf759f9ddbc_1166x646.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1xI6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81e5281f-bf5e-4010-b520-6bf759f9ddbc_1166x646.png" width="1166" height="646" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/81e5281f-bf5e-4010-b520-6bf759f9ddbc_1166x646.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:646,&quot;width&quot;:1166,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:172451,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://x.com/FundamentEdge/status/2015637398529671507&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.flatcircle.ai/i/185737197?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81e5281f-bf5e-4010-b520-6bf759f9ddbc_1166x646.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!1xI6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81e5281f-bf5e-4010-b520-6bf759f9ddbc_1166x646.png 424w, https://substackcdn.com/image/fetch/$s_!1xI6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81e5281f-bf5e-4010-b520-6bf759f9ddbc_1166x646.png 848w, https://substackcdn.com/image/fetch/$s_!1xI6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81e5281f-bf5e-4010-b520-6bf759f9ddbc_1166x646.png 1272w, https://substackcdn.com/image/fetch/$s_!1xI6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81e5281f-bf5e-4010-b520-6bf759f9ddbc_1166x646.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>My take: Brett also mentions the notion of &#8220;artisanal workflows&#8221; which I think is a good concept here: everyone&#8217;s vibecoded research systems are going to be unique.</p><p><strong>Fintool founder shares lessons from two years building investment research agents.</strong> Very thorough, technical and good <a href="https://x.com/nicbstme/status/2015174818497437834">(@Nicbstme</a>)</p><p><strong>Prompt for finding risk arb filings (<a href="https://x.com/RodAlzmann/status/2015230704334840218">@RodAlzmann</a>).</strong></p><p><strong>Prompt for first- and second-order impacts of tariffs, sanctions and export controls in a <a href="https://globalcapitalallocation.s3.us-east-2.amazonaws.com/CCMS_AI_Draft.pdf">research paper on geoeconomic pressure</a></strong>. Starts page 10.</p><p><strong>Tool using Opus 4.5 to estimate hidden real estate value</strong> (<a href="https://x.com/AltayCapital/status/2015364717636919557">@AltayCapital</a>)</p><p><strong>TMT and Energy investor shares experience building a 10-K deep dive tool</strong> (@<a href="https://x.com/TheValueist/status/2015156202695545312">TheValueist</a>)</p><h3>Interesting papers</h3><p><strong><a href="https://arxiv.org/pdf/2511.15593">What does it take to be a good AI research agent? Studying the role of ideation diversity</a>. </strong>Two good things in here: (i) as models get better at using tools, LLM systems designed for greater diversity of ideas will outperform, and (ii) changing temperature doesn&#8217;t really help.</p><p><strong><a href="https://arxiv.org/pdf/2601.15247">Taxonomy-aligned risk extraction from 10-K filings with autonomous improvement using LLMs</a>. </strong>Written by the team behind AI tool <a href="https://massive.com/">Massive</a>, so they don&#8217;t share the prompt. But it&#8217;s a good framework for classifying a large universe of companies into a customized set of risk factors. A related tool was recently released on X (<a href="https://x.com/stockthoughts81/status/2012205464373625144?s=20">@JaredKubin</a>). </p><h3>Updates from AI trading arenas</h3><p>These are public experiments where LLMs make trading decisions. <a href="https://blog.flatcircle.ai/p/can-llms-make-investment-decisions">Longer summary of this trend is here</a>; takeaway is that all trading robots eventually lose money but worth monitoring because the newer, more expensive models are starting to lose less money.</p><p><strong><a href="https://www.asterdex-testnet.com/en/campaigns/human-vs-ai">Aster - Humans vs AI</a>:</strong> traders compete with robots for a $150K prize pool. Right now the AIs are being the humans on average, but nine out of the top ten traders are human. This feels about right!</p><p><strong><a href="https://arena.okbet.trade/">Okbet Arena</a></strong>: 5 models compete at placing bets on Polymarket. All models are losing money, and right now GPT-5.1 is in the lead while Deepseek R1 is in last place.</p><p><strong><a href="https://openfinarena.com/fin-deep-forecast">FinDeepForecast</a>:</strong> Live benchmark based on a new paper: <a href="https://arxiv.org/abs/2601.05039">FinDeepForecast: A live multi-agent system for benchmarking deep research agents in financial forecasting</a>. Basically identical results to Okbet Arena.</p><h3><strong>Follow for more investor prompts and workflows </strong></h3><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.flatcircle.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.flatcircle.ai/subscribe?"><span>Subscribe now</span></a></p><p><em>If you would like to discuss incorporating LLMs into your research process, reply to this email or reach out via <a href="https://x.com/jdmoran">X</a> or <a href="https://www.linkedin.com/in/jamesdmoran/">LinkedIn</a>.</em></p>]]></content:encoded></item><item><title><![CDATA[Creative LLM use cases - Podcast agent, investor letters, synthetic panels]]></title><description><![CDATA[Plus 2 interesting job descriptions and 1 new tool]]></description><link>https://blog.flatcircle.ai/p/creative-llm-use-cases-podcast-agent</link><guid isPermaLink="false">https://blog.flatcircle.ai/p/creative-llm-use-cases-podcast-agent</guid><dc:creator><![CDATA[Jim Moran]]></dc:creator><pubDate>Thu, 15 Jan 2026 22:28:07 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!hu9u!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c48ac03-3f89-4f0d-8d08-e5a8af5a0215_966x517.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Flat Circle tracks creative use cases of LLMs in hedge funds. If you haven&#8217;t already, join hundreds of PMs, analysts and engineers reading each week:</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.flatcircle.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.flatcircle.ai/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h3>Interesting use cases</h3><p><strong>Have an LLM listen to every industry podcast and alert you with anything relevant <a href="https://podcast.flatcircle.ai/">using our new Podcast Agent</a></strong></p><p>26 million podcast episodes are published each year. Many are interviews with management teams from public companies, their competitors, customers and suppliers. Based on interest from several funds, we built a simple tool that listens to every podcast and alerts you of anything incremental to your coverage. Couple examples from this week:</p><ul><li><p>ORCL: VP of a 17 hospital system discusses their recent decision to transition off Oracle Health (<a href="https://podcast.flatcircle.ai/?ticker=ORCL">Becker&#8217;s Healthcare Podcast</a>)</p></li><li><p>COTY, ULTA: Ulta&#8217;s SVP of Ecom discusses their new marketplace, how it plans to bring emerging brands into their brick and mortal channels (<a href="https://podcast.flatcircle.ai/?ticker=COTY">Omni Talk Retail</a>)</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="http://podcast.flatcircle.ai" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hu9u!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c48ac03-3f89-4f0d-8d08-e5a8af5a0215_966x517.png 424w, https://substackcdn.com/image/fetch/$s_!hu9u!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c48ac03-3f89-4f0d-8d08-e5a8af5a0215_966x517.png 848w, https://substackcdn.com/image/fetch/$s_!hu9u!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c48ac03-3f89-4f0d-8d08-e5a8af5a0215_966x517.png 1272w, https://substackcdn.com/image/fetch/$s_!hu9u!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c48ac03-3f89-4f0d-8d08-e5a8af5a0215_966x517.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hu9u!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c48ac03-3f89-4f0d-8d08-e5a8af5a0215_966x517.png" width="966" height="517" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1c48ac03-3f89-4f0d-8d08-e5a8af5a0215_966x517.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:517,&quot;width&quot;:966,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:130529,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;http://podcast.flatcircle.ai&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.flatcircle.ai/i/184690881?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c48ac03-3f89-4f0d-8d08-e5a8af5a0215_966x517.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hu9u!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c48ac03-3f89-4f0d-8d08-e5a8af5a0215_966x517.png 424w, https://substackcdn.com/image/fetch/$s_!hu9u!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c48ac03-3f89-4f0d-8d08-e5a8af5a0215_966x517.png 848w, https://substackcdn.com/image/fetch/$s_!hu9u!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c48ac03-3f89-4f0d-8d08-e5a8af5a0215_966x517.png 1272w, https://substackcdn.com/image/fetch/$s_!hu9u!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c48ac03-3f89-4f0d-8d08-e5a8af5a0215_966x517.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We plan to add more sources of content to this over time. <a href="https://podcast.flatcircle.ai/">Sign up for free</a>.</p><p><strong>Askelladden Capital discusses its process for<a href="https://askeladdencapital.com/wp-content/uploads/2025/10/2025-08-10-Askeladden-Capital-Q2-2024-Letter-10X.pdf"> ingesting and scoring fund LP letters</a></strong></p><blockquote><p><em>I&#8217;ve built a tool that reads investment letters of other fund managers, ignore all macro / philosophical discussion, extract only single-ticker investment ideas, summarize them, and scores them against a rubric based on our historical priorities. That rubric was &#8211; drumroll &#8211; drafted by AI after reading years of our letters, then subsequently refined by me. </em></p></blockquote><p>My take: If you&#8217;re building this agent for your process, <a href="https://www.joinyellowbrick.com/">Yellowbrick</a> is a pretty good datasource for investor theses.</p><p><strong>Prompt from Reddit commenter to analyze 10Ks/Qs for <a href="https://www.reddit.com/r/ValueInvesting/comments/1pdyg23/comment/nsairj8/">changes in forward looking statements. </a></strong></p><p>My take: Agree with the focus on using LLM to extract verbatim language vs return a conclusion.<a href="https://www.reddit.com/r/ValueInvesting/comments/1pdyg23/comment/nsairj8/"> </a></p><h3>Interesting papers</h3><p><strong><a href="https://arxiv.org/abs/2510.08338">LLMs Reproduce Human Purchase Intent via Semantic Similarity Elicitation of Likert Ratings</a>.</strong> Surveys a panel of LLMs pretending to be shoppers and compares results against real panelists, finding a 90% overlap. Prompts discussed in the paper and on their <a href="https://github.com/pymc-labs/semantic-similarity-rating">github</a>.</p><p>My take: There&#8217;s a lot of synthetic panel research vendors including <a href="https://aaru.com/">Aaru</a>, <a href="https://www.electrictwin.com/">Electric Twin</a> and <a href="https://www.qualtrics.com/strategy/audiences/">Qualtrics</a>. I think synthetic panels work almost as well as human panels, which is not that great. The best ones are calibrated against real panel and purchasing data. I&#8217;m not clear why a panel of LLMs is better than a system leveraging a single LLM, or even what the difference is. Synthetic panels are also unlikely to produce entirely new findings like humans. They also might be more useful for companies for feedback on a specific product, vs investors who are looking for fresh, on the ground information. There&#8217;s a lot of interest in this space, so will explore it more and report back with anything interesting.</p><h3>New tools</h3><p><strong><a href="https://x.com/omooretweets/status/2011141771636740251?s=20">SimilarWeb has started offering 12 months of data free through Manus paid and free plans</a></strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://x.com/omooretweets/status/2011141771636740251?s=20" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!maTj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F303e9d17-28ca-4b24-a331-b3afe3e8bc94_599x530.png 424w, https://substackcdn.com/image/fetch/$s_!maTj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F303e9d17-28ca-4b24-a331-b3afe3e8bc94_599x530.png 848w, https://substackcdn.com/image/fetch/$s_!maTj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F303e9d17-28ca-4b24-a331-b3afe3e8bc94_599x530.png 1272w, https://substackcdn.com/image/fetch/$s_!maTj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F303e9d17-28ca-4b24-a331-b3afe3e8bc94_599x530.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!maTj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F303e9d17-28ca-4b24-a331-b3afe3e8bc94_599x530.png" width="599" height="530" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/303e9d17-28ca-4b24-a331-b3afe3e8bc94_599x530.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:530,&quot;width&quot;:599,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:110550,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://x.com/omooretweets/status/2011141771636740251?s=20&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.flatcircle.ai/i/184690881?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F303e9d17-28ca-4b24-a331-b3afe3e8bc94_599x530.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!maTj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F303e9d17-28ca-4b24-a331-b3afe3e8bc94_599x530.png 424w, https://substackcdn.com/image/fetch/$s_!maTj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F303e9d17-28ca-4b24-a331-b3afe3e8bc94_599x530.png 848w, https://substackcdn.com/image/fetch/$s_!maTj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F303e9d17-28ca-4b24-a331-b3afe3e8bc94_599x530.png 1272w, https://substackcdn.com/image/fetch/$s_!maTj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F303e9d17-28ca-4b24-a331-b3afe3e8bc94_599x530.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>My take: SimilarWeb already offers an MCP server for its existing customers to access via LLM, the difference here is their move to share data for free as lead gen. I think we&#8217;ll see a lot more of this, especially in relatively commoditized segments like web traffic data. Worth noting this was announced two weeks after Meta acquired Manus for &gt;$2b.</p><h3>Interesting job descriptions</h3><p><strong><a href="https://job-boards.greenhouse.io/alphasense/jobs/8355987002">AlphaSense/Tegus is hiring a hedge fund LLM workflows product manager</a></strong> </p><blockquote><p><em>your value lies in your ~5 years of experience at a top-tier Hedge Fund&#8230;You will be the primary architect of &#8220;quality.&#8221; Design, test, and refine prompts to ensure our AI output meets the high standards of a professional investor. You will look at an AI summary or extraction and immediately know if it &#8220;sounds right&#8221; to a PM or Analyst</em></p></blockquote><p><strong><a href="https://www.linkedin.com/jobs/view/senior-genai-engineer-%E2%80%93-advanced-rag-at-millennium-4330058372/">Millennium hiring GenAI engineer for advanced RAG</a></strong></p><h3><strong>Follow for more case studies of LLMs+Investment Research</strong></h3><p><em>If you would like to discuss incorporating LLMs into your research process, reply to this email or reach out via <a href="https://x.com/jdmoran">X</a> or <a href="https://www.linkedin.com/in/jamesdmoran/">LinkedIn</a>.</em></p>]]></content:encoded></item><item><title><![CDATA[Creative LLM use cases - Evaluating new CEOs, summarizing Bloomberg IB chats, charting investor narratives]]></title><description><![CDATA[Hedge fund case studies, tools and hiring trends]]></description><link>https://blog.flatcircle.ai/p/creative-llm-use-cases-evaluating</link><guid isPermaLink="false">https://blog.flatcircle.ai/p/creative-llm-use-cases-evaluating</guid><dc:creator><![CDATA[Jim Moran]]></dc:creator><pubDate>Fri, 09 Jan 2026 18:29:04 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Uwve!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ba3d885-a1ee-4884-993e-b59e5cdf302a_1016x553.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Flat Circle tracks creative use cases of LLMs in hedge funds. If you haven&#8217;t already, join hundreds of PMs, analysts and engineers reading each week:</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.flatcircle.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.flatcircle.ai/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h3>Case studies</h3><p><strong>Former PM at Schoenfeld, Citadel, DE Shaw shares his prompt for evaluating a new CEO</strong> (<a href="https://x.com/FundamentEdge/status/1914694781462343979">@FundamentEdge</a>) </p><p>My take: I like Brett&#8217;s prompt because it first asks the model to do a lot of work understanding the history of the situation and what levers might be available new for management. That&#8217;s important for sizing the opportunity but for the user to see if the model&#8217;s understanding of history aligns with their own.</p><p><strong>Australian hedge fund Minotaur Capital says new Claude Opus 4.5 model outperforms OpenAI on its internal research benchmarks </strong>(<a href="https://www.minotaurcapital.com/reports/monthly/2025-12">LP Letter</a>)</p><blockquote><p><em>On the technology front, we&#8217;ve spoken extensively about our disciplined framework for testing and evaluating large language models across different use cases. One of the challenges is how quickly the frontier shifts. Recently, we&#8217;ve been testing Codex versus Claude for writing and research tasks and have found that Claude (Opus 4.5) is currently delivering superior results. As a result, we&#8217;ve migrated a meaningful portion of our internal research workflows accordingly.</em></p></blockquote><p>My take: New models come out frequently and even the same model can vary in performance week to week. Maintaining an objective benchmark for research summary is necessary to ensure you&#8217;re always using the highest quality model. One way I&#8217;ve seen this done is having the top LLMs summarize every earnings press release as soon as its posted, then later scoring each one against sellside recaps. A decent proxy for the best model at junior analyst type work.  </p><p><strong>JPMorgan is cutting ties with proxy advisory firms and will use in-house AI to cast shareholder votes (</strong><a href="https://www.wsj.com/finance/banking/jpmorgan-cuts-all-ties-with-proxy-advisers-in-industry-first-78c43d5f">WSJ</a><strong>)</strong></p><blockquote><p><em>The bank will use the platform to manage the votes and the AI also will analyze data from more than 3,000 annual company meetings and provide recommendations to the portfolio managers, the memo said, replacing the typical roles of proxy advisers.</em></p></blockquote><p>My take: Glass-Lewis + ISS is a duopoly because they aggregate voting power on behalf of their institutional clients. LLMs will cost them pricing power but I don&#8217;t see why most asset managers will follow suit and build their own.</p><h3>New tools</h3><p><strong>New tool uses Gemini to chart the cultural prominence of various narratives over time (</strong><a href="https://www.eigencultures.com/?dataset=beauty_clusters_1880_2020&amp;clusters=24_financialization%2C17_consumerist_hedonism">Cultural Eigenclusters</a>, <a href="https://www.thediff.co/archive/longreads-open-thread-163/">The Diff</a><strong>)</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.eigencultures.com/?dataset=beauty_clusters_1880_2020&amp;clusters=24_financialization%2C17_consumerist_hedonism" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Uwve!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ba3d885-a1ee-4884-993e-b59e5cdf302a_1016x553.png 424w, https://substackcdn.com/image/fetch/$s_!Uwve!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ba3d885-a1ee-4884-993e-b59e5cdf302a_1016x553.png 848w, https://substackcdn.com/image/fetch/$s_!Uwve!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ba3d885-a1ee-4884-993e-b59e5cdf302a_1016x553.png 1272w, https://substackcdn.com/image/fetch/$s_!Uwve!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ba3d885-a1ee-4884-993e-b59e5cdf302a_1016x553.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Uwve!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ba3d885-a1ee-4884-993e-b59e5cdf302a_1016x553.png" width="1016" height="553" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7ba3d885-a1ee-4884-993e-b59e5cdf302a_1016x553.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:553,&quot;width&quot;:1016,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:91336,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://www.eigencultures.com/?dataset=beauty_clusters_1880_2020&amp;clusters=24_financialization%2C17_consumerist_hedonism&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.flatcircle.ai/i/183956975?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ba3d885-a1ee-4884-993e-b59e5cdf302a_1016x553.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Uwve!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ba3d885-a1ee-4884-993e-b59e5cdf302a_1016x553.png 424w, https://substackcdn.com/image/fetch/$s_!Uwve!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ba3d885-a1ee-4884-993e-b59e5cdf302a_1016x553.png 848w, https://substackcdn.com/image/fetch/$s_!Uwve!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ba3d885-a1ee-4884-993e-b59e5cdf302a_1016x553.png 1272w, https://substackcdn.com/image/fetch/$s_!Uwve!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ba3d885-a1ee-4884-993e-b59e5cdf302a_1016x553.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>My take: I think something similar can be combined with price data to map the narratives around assets. Reach out if you&#8217;re interested in exploring.</p><h3>Interesting job descriptions</h3><p><strong>Soros is hiring an AI Orchestration Engineer</strong> (<a href="https://www.linkedin.com/jobs/view/ai-orchestration-engineer-at-soros-fund-management-4330376981/">LinkedIn</a>)</p><blockquote><p><em>Examples projects include tonal analysis of earnings calls or summarization of Bloomberg IB chats.</em></p></blockquote><p><strong>Verso Parters, a $600mm AUM SF-based hedge fund, is hiring a founding AI product engineer </strong>(<a href="https://imagine.jhu.edu/jobs/verso-partners-founding-ai-product-engineer/">Johns Hopkins</a>)</p><blockquote><p><em>Much of our investment research process is qualitative: expert interviews, reconstructing industry history, understanding how a company got to where it is, pressure-testing narratives, and tracking what would change our mind. The challenge is that the raw material of great research is vast and messy &#8211; notes, transcripts, filings, models, datasets, internal memos, and trade history.</em></p><p><em>Your job will be to build a <strong>research and decision-support &#8220;OS&#8221;</strong> that helps us:</em></p><ul><li><p><em>execute our research process more effectively,</em></p></li><li><p><em>integrate insights across disparate data sources,</em></p></li><li><p><em>spot biases and patterns in our own analyses and trading behavior,</em></p></li><li><p><em>and ultimately <strong>ask better questions, see the ball more clearly, and make better investment decisions.</strong></em></p></li></ul></blockquote><p><strong>OpenAI hiring large cap research analyst from buyside or sellside</strong> (<a href="https://www.linkedin.com/posts/activity-7414444992657534976-9NCz/">LinkedIn</a>)</p><blockquote><p><em>As equity research community&#8217;s interest in OpenAI grows, we are hiring a full-time role to engage closely with the analysts. Looking for an experienced large-cap equity research analyst from buy/sell-side. Arguably one of the best seats in the world to understand AI if you are curious. DM me if interested.</em></p></blockquote><p><strong>Millennium hiring AI engineer - equities technology</strong> (<a href="https://career.mlp.com/careers?pid=755953793051">MLP careers</a>)</p><p><strong>Norway&#8217;s sovereign wealth fund hiring engineer for LLM workflows</strong> (<a href="https://www.linkedin.com/posts/stiank_ai-machinelearning-nbim-activity-7414664735629201408-KGnY/">LinkedIn</a>)</p><h3>Interesting papers</h3><p><strong>LLM driven investment strategies lose money over time because they struggle account for regime change</strong> (<a href="https://arxiv.org/abs/2505.07078">arXiv.org</a>)</p><h3><strong>Follow for more case studies of LLMs+Investment Research</strong></h3><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.flatcircle.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.flatcircle.ai/subscribe?"><span>Subscribe now</span></a></p><p><em>If you would like to discuss incorporating LLMs into your research process, reply to this email or reach out via <a href="https://x.com/jdmoran">X</a> or <a href="https://www.linkedin.com/in/jamesdmoran/">LinkedIn</a>.</em></p>]]></content:encoded></item><item><title><![CDATA[Can LLMs make investment decisions?]]></title><description><![CDATA[Models are trading stocks and betting on prediction markets in new public "arenas"]]></description><link>https://blog.flatcircle.ai/p/can-llms-make-investment-decisions</link><guid isPermaLink="false">https://blog.flatcircle.ai/p/can-llms-make-investment-decisions</guid><dc:creator><![CDATA[Jim Moran]]></dc:creator><pubDate>Tue, 06 Jan 2026 13:06:29 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Re0z!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18e7e67b-64e7-4929-acf2-85f0bb26ede1_2426x1256.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3>Key Takeaways</h3><ul><li><p>There are a growing number of public experiments where LLMs make investment decisions and forecast future events</p></li><li><p>There isn&#8217;t yet evidence of models beating the market at meaningful scale or statistical significance. However, each new generation of frontier model tends to be less bad</p></li><li><p>OpenAI and Grok tend to be better at trading while Claude tends to be better at pure forecasting</p></li><li><p>The various arena designs address the various &#8220;catch 22s&#8221; when LLMs make investment decisions including: (i) choosing across many assets vs  focusing your tokens on a single asset, (ii) tool use vs context size, (iii) &#8220;real time&#8221; vs randomness, (iv) forecasting skill vs data access</p></li><li><p>We will monitor these arenas going forward, as they are prototypes for eventual institutional strategies</p></li></ul><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.flatcircle.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://blog.flatcircle.ai/subscribe?"><span>Subscribe now</span></a></p><h3>Background</h3><p>Hedge funds primarily use LLMs to make their teams more efficient, but everyone secretly wonders whether the models will eventually make investment decisions on their own.</p><p>There&#8217;s a growing universe of &#8220;arenas&#8221; where models are evaluated on their ability to predict events and make investment decisions. Unlike popular benchmarks such as <a href="https://evals.openai.com/gdpval/leaderboard">GDPval</a>, <a href="https://epoch.ai/benchmarks/gpqa-diamond">GPQA</a> and <a href="https://arcprize.org/">ARC</a>, the answers haven&#8217;t happened yet so there&#8217;s no risk of contamination, and they can&#8217;t be saturated because the market makes them harder every day. Also unlike typical benchmarks, these investing and forecasting arenas exactly match a highly valuable real world task.</p><h3>LLM Investing Arenas</h3><p><strong>Alpha Arena (</strong><a href="http://nof1.ai">nof1.ai</a>)</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://nof1.ai" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!weox!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53ff786a-456a-4e39-adff-5ee8760e3d91_962x560.png 424w, https://substackcdn.com/image/fetch/$s_!weox!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53ff786a-456a-4e39-adff-5ee8760e3d91_962x560.png 848w, https://substackcdn.com/image/fetch/$s_!weox!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53ff786a-456a-4e39-adff-5ee8760e3d91_962x560.png 1272w, https://substackcdn.com/image/fetch/$s_!weox!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53ff786a-456a-4e39-adff-5ee8760e3d91_962x560.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!weox!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53ff786a-456a-4e39-adff-5ee8760e3d91_962x560.png" width="962" height="560" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/53ff786a-456a-4e39-adff-5ee8760e3d91_962x560.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:560,&quot;width&quot;:962,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:501605,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://nof1.ai&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.flatcircle.ai/i/183589910?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53ff786a-456a-4e39-adff-5ee8760e3d91_962x560.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!weox!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53ff786a-456a-4e39-adff-5ee8760e3d91_962x560.png 424w, https://substackcdn.com/image/fetch/$s_!weox!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53ff786a-456a-4e39-adff-5ee8760e3d91_962x560.png 848w, https://substackcdn.com/image/fetch/$s_!weox!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53ff786a-456a-4e39-adff-5ee8760e3d91_962x560.png 1272w, https://substackcdn.com/image/fetch/$s_!weox!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53ff786a-456a-4e39-adff-5ee8760e3d91_962x560.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p><strong>System design:</strong> In Alpha Arena, the models trade $10K each in real money across 7 stocks. Every ~2 minutes, the system asks each model to make a buy/sell/hold trading decision with context including its current portfolio, news, trading data and the original trade parameters. The arena is actually four separate arenas, each with a unique trading goal, in order to add statistical power to the final leaderboard.</p></li><li><p><strong>Result: </strong>All models eventually lost money, but Grok 4.20 (pre-release) and GPT-5.1 performed the best and made money in a couple instances. Claude Sonnet 4.5 and Grok 4 performed the worst</p></li><li><p><strong>My Take: </strong>Very slick implementation of LLM trading, and I&#8217;m excited to see what they roll out in &#8220;season 2.&#8221; One challenge with LLMs is they are usually non-deterministic, meaning they don&#8217;t produce the same answer every time. So if you call your model enough it might randomly dump your entire portfolio. The nof1.ai team solved this problem by prompting the model to create a trading plan (price target, stop loss, invalidation conditions, etc), then feeding that same plan back in future calls. Another smart thing they do is ask each model about the narrative supporting each trade, and where it expects it to go.</p></li></ul><p><strong>AI Controls Stock Account (</strong><a href="https://nathanbsmith729.substack.com/">Nathan Smith</a>)</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://nathanbsmith729.substack.com/" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!G7rH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafc6aa1c-e677-425d-8f7d-0926672bb0fe_1000x600.png 424w, https://substackcdn.com/image/fetch/$s_!G7rH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafc6aa1c-e677-425d-8f7d-0926672bb0fe_1000x600.png 848w, https://substackcdn.com/image/fetch/$s_!G7rH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafc6aa1c-e677-425d-8f7d-0926672bb0fe_1000x600.png 1272w, https://substackcdn.com/image/fetch/$s_!G7rH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafc6aa1c-e677-425d-8f7d-0926672bb0fe_1000x600.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!G7rH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafc6aa1c-e677-425d-8f7d-0926672bb0fe_1000x600.png" width="1000" height="600" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/afc6aa1c-e677-425d-8f7d-0926672bb0fe_1000x600.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:600,&quot;width&quot;:1000,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://nathanbsmith729.substack.com/&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!G7rH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafc6aa1c-e677-425d-8f7d-0926672bb0fe_1000x600.png 424w, https://substackcdn.com/image/fetch/$s_!G7rH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafc6aa1c-e677-425d-8f7d-0926672bb0fe_1000x600.png 848w, https://substackcdn.com/image/fetch/$s_!G7rH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafc6aa1c-e677-425d-8f7d-0926672bb0fe_1000x600.png 1272w, https://substackcdn.com/image/fetch/$s_!G7rH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafc6aa1c-e677-425d-8f7d-0926672bb0fe_1000x600.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p><strong>System design: </strong>Ran ChatGPT Deep Research once per week for six months to allocate real money across a universe of small cap healthcare stocks </p></li><li><p><strong>Result:</strong> -17%</p></li><li><p><strong>My take: </strong>This was a great, early implementation (especially by a high school student!). A challenge with this approach is it uses one giant call to set its portfolio each week. That spreads its tokens across many, many different potential investment decisions. The name of the game is to burn as many tokens on the most valuable decisions, so I believe it&#8217;s better to build up the portfolio with many smaller decisions. This experiment also highlighted another crucial issue with LLM driven trading: portfolio construction. One of its core positions, AYTR, fell 83% when it announced failed Phase 3 trial results, and the portfolio never recovered. The problem wasn&#8217;t that the LLM should have known (the entire market was offsides), the problem was it shouldn&#8217;t have been such a large position given the source of edge had nothing to do with predicting drug trial results.</p></li></ul><p><strong>AI Investing Arena</strong> (<a href="https://investing-arena.bobbydhungana.com/">Bobby Dhungana</a>)</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://investing-arena.bobbydhungana.com/" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hPl9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53e1f527-d6bd-4f17-84ef-6f9d0ce8d4f7_3230x1638.png 424w, https://substackcdn.com/image/fetch/$s_!hPl9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53e1f527-d6bd-4f17-84ef-6f9d0ce8d4f7_3230x1638.png 848w, https://substackcdn.com/image/fetch/$s_!hPl9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53e1f527-d6bd-4f17-84ef-6f9d0ce8d4f7_3230x1638.png 1272w, https://substackcdn.com/image/fetch/$s_!hPl9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53e1f527-d6bd-4f17-84ef-6f9d0ce8d4f7_3230x1638.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hPl9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53e1f527-d6bd-4f17-84ef-6f9d0ce8d4f7_3230x1638.png" width="1456" height="738" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/53e1f527-d6bd-4f17-84ef-6f9d0ce8d4f7_3230x1638.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:738,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:358349,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://investing-arena.bobbydhungana.com/&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.flatcircle.ai/i/183589910?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53e1f527-d6bd-4f17-84ef-6f9d0ce8d4f7_3230x1638.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hPl9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53e1f527-d6bd-4f17-84ef-6f9d0ce8d4f7_3230x1638.png 424w, https://substackcdn.com/image/fetch/$s_!hPl9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53e1f527-d6bd-4f17-84ef-6f9d0ce8d4f7_3230x1638.png 848w, https://substackcdn.com/image/fetch/$s_!hPl9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53e1f527-d6bd-4f17-84ef-6f9d0ce8d4f7_3230x1638.png 1272w, https://substackcdn.com/image/fetch/$s_!hPl9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53e1f527-d6bd-4f17-84ef-6f9d0ce8d4f7_3230x1638.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p><strong>System design:</strong> Models paper trade 5 ETF (S&amp;P 500, Nasdaq, Gold, Interest Rates, Oil). The system asks each model to make a buy/sell/hold trading decision every 30 ~minutes with context including VIX volatility, treasury yields, dollar strength, oil prices </p></li><li><p><strong>Result:</strong> Still active, started Nov 25. GPT-5 in the lead, Claude Sonnet 4.5 in last place, though all are ~breakeven </p></li><li><p><strong>My take: </strong>Was inspired by and has similar implementation to Alpha Arena. But I like the focus of allocating across ETFs vs individual stocks. It&#8217;s possible the generalist nature of LLMs make it better suited to allocating across sectors, vs individual stocks where they have an information disadvantage. But too early to draw conclusions as the experiment has only been running since late November.</p></li></ul><p><strong>AI Arena (</strong><a href="https://rallies.ai/arena">rallies.ai</a>)</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://rallies.ai/arena" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Re0z!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18e7e67b-64e7-4929-acf2-85f0bb26ede1_2426x1256.png 424w, https://substackcdn.com/image/fetch/$s_!Re0z!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18e7e67b-64e7-4929-acf2-85f0bb26ede1_2426x1256.png 848w, https://substackcdn.com/image/fetch/$s_!Re0z!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18e7e67b-64e7-4929-acf2-85f0bb26ede1_2426x1256.png 1272w, https://substackcdn.com/image/fetch/$s_!Re0z!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18e7e67b-64e7-4929-acf2-85f0bb26ede1_2426x1256.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Re0z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18e7e67b-64e7-4929-acf2-85f0bb26ede1_2426x1256.png" width="1456" height="754" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/18e7e67b-64e7-4929-acf2-85f0bb26ede1_2426x1256.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:754,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:474135,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://rallies.ai/arena&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.flatcircle.ai/i/183589910?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18e7e67b-64e7-4929-acf2-85f0bb26ede1_2426x1256.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Re0z!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18e7e67b-64e7-4929-acf2-85f0bb26ede1_2426x1256.png 424w, https://substackcdn.com/image/fetch/$s_!Re0z!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18e7e67b-64e7-4929-acf2-85f0bb26ede1_2426x1256.png 848w, https://substackcdn.com/image/fetch/$s_!Re0z!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18e7e67b-64e7-4929-acf2-85f0bb26ede1_2426x1256.png 1272w, https://substackcdn.com/image/fetch/$s_!Re0z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18e7e67b-64e7-4929-acf2-85f0bb26ede1_2426x1256.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p><strong>System design:</strong> Models maintain their own portfolios, evaluating them every few days. Architecture includes custom MCP servers and tool calls to distill a large universe of potential investments into a few potential trades, so the decision model can focus on choosing among a few quality options. </p><p><strong>Result:</strong> Almost every model is making money, led by Deepseek and Grok-4 but still early</p></li><li><p><strong>My take: </strong>This architecture addresses the catch 22 of wanting the model to select among as many assets as possible, while still focusing as many tokens as possible on individual decisions. Their solution is an extensive screening step to first identify stocks at technical extremes, with unusual options flow, interesting fundamentals and near term catalysts. Still too early to draw conclusions, needs to go through an earnings cycle. Factors likely explain most of the move so far.</p></li></ul><p><strong>Flat Circle Arena </strong>(<a href="https://blog.flatcircle.ai/p/flat-circle-contrasting-good-vs-poor">Flat Circle</a>) </p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://blog.flatcircle.ai/p/flat-circle-contrasting-good-vs-poor" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Pp8i!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffeb9385d-f185-4f55-8c91-1673fca33514_951x237.png 424w, https://substackcdn.com/image/fetch/$s_!Pp8i!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffeb9385d-f185-4f55-8c91-1673fca33514_951x237.png 848w, https://substackcdn.com/image/fetch/$s_!Pp8i!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffeb9385d-f185-4f55-8c91-1673fca33514_951x237.png 1272w, https://substackcdn.com/image/fetch/$s_!Pp8i!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffeb9385d-f185-4f55-8c91-1673fca33514_951x237.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Pp8i!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffeb9385d-f185-4f55-8c91-1673fca33514_951x237.png" width="951" height="237" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/feb9385d-f185-4f55-8c91-1673fca33514_951x237.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:237,&quot;width&quot;:951,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://blog.flatcircle.ai/p/flat-circle-contrasting-good-vs-poor&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Pp8i!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffeb9385d-f185-4f55-8c91-1673fca33514_951x237.png 424w, https://substackcdn.com/image/fetch/$s_!Pp8i!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffeb9385d-f185-4f55-8c91-1673fca33514_951x237.png 848w, https://substackcdn.com/image/fetch/$s_!Pp8i!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffeb9385d-f185-4f55-8c91-1673fca33514_951x237.png 1272w, https://substackcdn.com/image/fetch/$s_!Pp8i!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffeb9385d-f185-4f55-8c91-1673fca33514_951x237.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><ul><li><p><strong>System design:</strong> Models paper traded individual earnings during 4Q24</p></li><li><p><strong>Results: </strong>OpenAI o1 and Grok-2 performed the best, while Claude Sonnet 3.5 performed the worst. o1 performed much better than o3-mini. Opus performed much better than Sonnet. The more expensive models outperformed the cheaper ones</p></li><li><p><strong>My take:</strong> This was an early, rudimentary effort. One advantage to focusing entirely on earnings is the results are &#8220;pure idio&#8221; - i.e., market and other factors have limited impact on the returns, you&#8217;re almost entirely measuring LLMs&#8217; ability to beat other investors. While these results were promising, results in subsequent earnings periods deteriorated as they entered different market environments (ie liberation day, AI capex boom). Another limitation of this strategy was focused on large cap stocks. It&#8217;s possible LLMs are more effective on the longer tail where there&#8217;s less competition.</p></li></ul><h3>LLM Forecasting Arenas</h3><p>Related, there are a handful of &#8220;forecasting arenas&#8221; where instead of investment decisions, models bet on prediction markets or forecast events.</p><ul><li><p><a href="https://www.metaculus.com/aib/">AI Forecasting Benchmark</a> by <a href="https://www.metaculus.com/">Metaculus</a> </p></li><li><p><a href="https://evals.futuresearch.ai/">Deep Research Bench</a> by <a href="https://futuresearch.ai/">FutureSearch</a></p></li><li><p><a href="https://forecasterarena.com/">Forecaster Arena</a> by <a href="https://www.linkedin.com/in/mert-gulsun/">Mert Gulson</a></p></li><li><p><a href="https://www.forecastbench.org/tournament/">ForecastBench</a> by <a href="https://forecastingresearch.org/">Forecasting Research Institute</a></p></li><li><p><a href="https://futurex-ai.github.io/">FutureX</a> by <a href="https://seed.bytedance.com/">ByteDance Seed</a></p></li><li><p><a href="https://www.gjopen.com/">Good Judgement Open</a> by <a href="https://goodjudgment.com/">Good Judgement</a></p></li><li><p><a href="https://manifold.markets/">Manifold Markets</a></p></li><li><p><a href="https://predibench.com/">PrediBench</a> by <a href="https://presagelabs.com/">Presage Labs</a></p></li><li><p><a href="https://www.prophetarena.co/agent-leaderboard">Prophet Arena</a> by <a href="https://www.haifeng-xu.com/sigma/index.html">Sigma Research Lab @UChicago</a></p></li></ul><p><strong>My take:</strong> Forecasting arenas are a purer benchmark on LLMs&#8217; ability to predict the future, and it turns out LLMs are pretty good at it. Results genearlly show leading forecasting models having a winning hitrate while betting on Polymarket or Kalshi (though unclear if good enough to win at any real scale). </p><p>Even the best models aren&#8217;t able to beat the best human forecasters (not sure they ever will, as the best forecasters also have access to LLMs). </p><p>Claude Opus and Sonnet appear the strongest at pure forecasting (unlike in the investing arenas, where they&#8217;re often the weakest). How could this be true? One theory is that Claude has the most analytical rigor (using baserates and proper scenario analysis) but weaker access to tools like google / x.com search that are more important for investing. This is where Grok, Gemini and OpenAI are strongest. </p><h3>Catch 22s when LLMs make investment decisions</h3><p>These arenas show various ways to address the &#8220;catch 22s&#8221; in LLMs making investment decisions:</p><ul><li><p>Wanting the model to select as many possible asset vs. focusing all your tokens on a single asset?</p></li><li><p>Providing access to as many tools as possible vs. managing to an optimal context size?</p></li><li><p>Allowing the models to make decisions in &#8220;real time&#8221; vs more randomness the more times you call the model?</p></li><li><p>Choosing the best forecaster model (Claude) vs the ones with proprietary data access (Gemini, Grok)?</p></li></ul><p>There&#8217;s no evidence yet of LLMs beating the market with any scale or statistical significance. However, there&#8217;s going to be a lot of new models and architectures released in 2026. With these improvements, we expect to see more institutional focus on LLMs making investment decisions.</p><h3>Follow for more on investing arenas and other creative LLM use cases</h3><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.flatcircle.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.flatcircle.ai/subscribe?"><span>Subscribe now</span></a></p><p><em>If you would like to discuss incorporating LLMs into your research process, reply to this email or reach out via <a href="https://x.com/jdmoran">X</a> or <a href="https://www.linkedin.com/in/jamesdmoran/">LinkedIn</a>.</em></p>]]></content:encoded></item><item><title><![CDATA[Creative ways hedge funds are using LLMs - Dec 30, 2025]]></title><description><![CDATA[Grading sellside analysts, identifying technical defaults, organizing Iran Notice disclosures and more]]></description><link>https://blog.flatcircle.ai/p/creative-ways-hedge-funds-are-using-10a</link><guid isPermaLink="false">https://blog.flatcircle.ai/p/creative-ways-hedge-funds-are-using-10a</guid><dc:creator><![CDATA[Jim Moran]]></dc:creator><pubDate>Tue, 30 Dec 2025 19:02:28 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!ulyQ!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22afde1e-8b68-470c-bb7c-cda75746a522_512x512.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3>Grading sellside analysts</h3><p><strong>A former sellside analyst used an LLM to <a href="https://knowtrend.ai/hindsightSpeakers?award_type=detective">analyze 10 years of earnings calls and identify analysts with the most prescient questions</a>. </strong>Examples:</p><p>Umer Raffat from UBS - Biogen (BIIB) Earnings Call - Oct 2019:</p><blockquote><p><em>&#8230;the implication in the data is that the high with insufficient exposure at the high dose, the second trial worked as well. But when we look at CDR Sum of the Boxes low dose actually looks more consistent than the high dose&#8230;how spot-on is that finding on patients that had a sufficient exposure?</em></p></blockquote><p>LLM reasoning: </p><blockquote><p><em>Raffat immediately spotted the fragility of the efficacy claim&#8212;dependence on a post-hoc exposure subset rather than the pre-specified ITT, foreshadowing the FDA advisory-panel scepticism and eventual withdrawal.</em></p></blockquote><p>Return since: -20%</p><p>Vivek Arya from Bank of America - Broadcom (AVGO) - Dec 2023:</p><blockquote><p><em>yesterday one of your peers suggested that the market for AI accelerators could be as large as $400 billion... how does Broadcom participate... and what does this larger AI accelerator market imply for your Ethernet networking business?</em></p></blockquote><p>LLM reasoning: </p><blockquote><p><em>Arya&#8217;s question forced management to quantify AI&#8217;s revenue contribution and linkage to networking months before the Street built outsized AI models. The $400B TAM figure sounded extreme in Dec-2023 yet is now consensus; Ethernet&#8217;s role in AI back-end networks&#8212;dismissed by many at the time&#8212;is today a core driver of Broadcom&#8217;s upside.</em></p></blockquote><p>Return since: +338%</p><p>My take: Great LLM use case, figuring out which analysts were right for the right reasons. Would be interesting to run something similar on initiations / ratings changes, controlling for sector returns, comparing insights vs peers. Of course, I&#8217;d want to know what these analysts are asking about next (here&#8217;s the analyst recommendation pages for <a href="https://www.tipranks.com/experts/analysts/umer-raffat">Umer Raffat</a> and <a href="https://www.tipranks.com/experts/analysts/vivek-arya">Vivek Arya</a>).</p><h3>Grading your own analysts</h3><p><em><strong>Walleye Capital <a href="https://every.to/podcast/transcript-6ee07ebc-1598-401b-bfa4-ef39ba70af47">records all internal conversations to analyze which of their teammembers are most prescient</a></strong></em></p><blockquote><p><em>&#8230;we really record every single Zoom, every single call.&#8230;So a big part of my job overseeing the risk of the firm, the chief investment officer title, every single morning, me and my risk team, sort of in the control center of running this giant process we have our risk calls and those are all recorded and we can go back and say, hey what were we talking about at this time? And continually have LLMs that are, that are processing those transcripts and helping us to both remember and provide insights and ultimately be a bit predictive, which has been hugely helpful just in that exercise, which is&#8212; We haven&#8217;t sort of talked about where I think this is going in the power of all this. And I mean, I do believe that we&#8217;re that we&#8217;re a leader. I don&#8217;t want to say we&#8217;re the leader because I definitely don&#8217;t know what other firms are doing, but I certainly think that we are a bit more advanced in our thinking of how to use these tools, but we&#8217;re just scratching the surface of what is possible once you actually start connecting all bits of information within the walls of the firm&#8230;</em></p></blockquote><p>My take: Would recommend an intermediate step focused on returning verbatim excerpts first, folks are going to want a lot of auditability.</p><h3>Finding short opportunities in bond indentures</h3><p><strong>LLMs may have identified the opportunity to <a href="https://www.thediff.co/archive/brokers-and-dealers-in-talent/">accelerate Avid Bioscience&#8217;s debt and short their stock</a>. </strong>Last March, Avid Biosciences received an acceleration notice because it had failed to remove a restrictive legend on its 2026 notes, causing it be in default. Avid shares declined 28% when it revealed it needed to raise $160mm in private placement to redeem the notes.</p><p>Byrne Hobart covered this in The Diff, concluding:</p><blockquote><p><em>And, right now, it&#8217;s suddenly gotten much easier to do this at scale: you can unleash LLMs on indenture agreements, and try to find edge cases that the company didn&#8217;t think of or notice. These will all be technicalities in practice; in the Avid case, if the restriction had been a big deal to the note owner, they probably would have noticed right away. But, perhaps coincidentally, they only noticed after the newly-widespread availability of tools that can trawl through vast amounts of text to extract useful information.</em></p></blockquote><p>In Money Stuff, Matt Levine weighed in that <a href="https://www.bloomberg.com/opinion/articles/2024-03-13/to-buy-a-bank-you-have-to-be-a-bank">he was skeptical an LLM found this opportunity</a>.</p><p>My take: Hard to be sure, but I think this opportunity *was* found by an LLM because the acceleration notice was received just two weeks after Google released Gemini 1.5 Pro - the first time a 1mm token context window was generally available - enabling the analysis of huge documents. Would have been straightforward for any fund to cycle through indentures to identify technicalities that could merit an acceleration notice. In fact, this would make a pretty interesting eval for new models that get released: run them against a huge corpus of indenture agreements and see what new opportunities get identified.</p><h3>Iran Notice disclosures</h3><p><strong>John Friedman, CEO of Datamule, collected and published a <a href="https://medium.com/@jgfriedman99/extracting-iran-disclosures-from-sec-filings-and-vectorizing-them-for-semantic-search-7c58a929d8e7">searchable dataset of Iran Notice disclosures from SEC filings</a></strong></p><p>My take: one way I think about LLMs is they enable instant creation of datasets that are plausibly interesting but not worth hiring and waiting for a human team to build.</p><h3>Interesting job posts</h3><p><em>Select buyside LLM related job descriptions:</em></p><p><strong>Point72 - <a href="https://www.linkedin.com/jobs/view/ai-engineer-investment-research-workflows-at-point72-4346046683/">AI Engineer &#8211; Investment Research &amp; Workflows</a> ($150K-$200K)</strong></p><blockquote><p><em>This role partners directly with L/S equity portfolio managers, analysts, and business leadership to build innovative solutions to improve efficiency and research quality across the equities platform&#8230;</em></p></blockquote><p><strong>Longaeva (new Baly platform)</strong> - <strong><a href="https://www.linkedin.com/jobs/view/research-product-associate-ai-enablement-at-balyasny-asset-management-l-p-4325327934/">Research Product Associate - AI Enablement</a></strong></p><blockquote><p><em>Longaeva is adding an associate to join the proprietary research team to accelerate adoption of generative AI products across investment strategies. In this role, you will embed directly with the proprietary research and investment teams to build solutions that impact investment decisions. We are seeking a capable, technical<strong> </strong>candidate&#8212;someone able to do hands-on research product development, web scraping, and LLM/AI-powered synthesis of qualitative and quantitative data. The ideal candidate blends scrappy coding, data/information aggregation, and a strong product intuition, with a proven ability to ship projects fast and independently. You will translate our AI capabilities into actionable insights by rapidly prototyping agentic workflows, building novel research products, and driving adoption of in-house tools.</em></p></blockquote><p><strong>Bayview ($30b AUM credit firm) - <a href="https://jobs.wallstreetcareers.com/jobs/172134296-llm-analyst">LLM Analyst</a> ($90K - $110K)</strong></p><blockquote><p><em>The Research team at Bayview Asset Management is hiring an LLM Analyst to unlock insight from large volumes of textual data, both external and internal, to inform investment theses, improve operations, and answer foundational questions about the mortgage industry and more broadly, the economy...Meet with portfolio managers, traders, marketing and servicing teams to identify and narrow down the question. Understand the business context behind each question&#8230;.<br>Prototype quickly but evaluate rigorously: Design prompts, run experiments in notebooks and concisely synthesize results for fast iteration. Define clear success metrics to measure progress.</em></p></blockquote><p><strong>xAI - <a href="https://www.linkedin.com/jobs/view/4311184558/">AI Buy-Side Finance Tutor</a> ($45/hr)</strong></p><blockquote><p><em>We are seeking a skilled AI Buy-Side Finance Data Specialist to enhance xAI&#8217;s AI models by providing high-quality data annotations and inputs tailored to buy-side finance contexts. In this role, you will leverage your expertise in portfolio management, hedge fund strategies, private equity investments, venture capital deal sourcing, and high-frequency trading algorithms to support the training of AI systems. You will collaborate with technical teams to refine annotation tools and curate impactful data, ensuring our models effectively capture real-world buy-side finance dynamics.</em></p></blockquote><p>My Take: Interesting these are mainly early career type hires, no graduate degree required. Looks like funds are happy to build on top of frontier models and popular tools. Lots of focus on prototyping and experimentation. The Grok LLM trainer hourly rate feels a little light!!</p><h3><strong>Follow for more case studies of LLMs+Investment Research</strong></h3><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.flatcircle.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.flatcircle.ai/subscribe?"><span>Subscribe now</span></a></p><p><em>If you would like to discuss incorporating LLMs into your research process, reply to this email or reach out via <a href="https://x.com/jdmoran">X</a> or <a href="https://www.linkedin.com/in/jamesdmoran/">LinkedIn</a>.</em></p>]]></content:encoded></item><item><title><![CDATA[Creative ways hedge funds are using LLMs]]></title><description><![CDATA[Tracking case studies of investors using LLMs in their research process]]></description><link>https://blog.flatcircle.ai/p/creative-ways-hedge-funds-are-using</link><guid isPermaLink="false">https://blog.flatcircle.ai/p/creative-ways-hedge-funds-are-using</guid><dc:creator><![CDATA[Jim Moran]]></dc:creator><pubDate>Tue, 23 Dec 2025 20:29:22 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!ulyQ!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22afde1e-8b68-470c-bb7c-cda75746a522_512x512.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3><strong>Learning from prediction markets</strong></h3><p><strong>Rick Bhowmick, Head of Data Eng at Coatue, <a href="https://x.com/pathikrit_wrick/status/1986780625630540106">built a system that reads Polymarket and Kalshi and generates a daily investment newsletter based on odds changes</a></strong></p><p>Check out <a href="https://pathikrit.github.io/zeitgeist/2025/12/23/">today&#8217;s newsletter</a> and here&#8217;s the <a href="https://github.com/pathikrit/zeitgeist">github</a> if you want to try yourself.</p><p>My take: I think this is really cool, and modified it to send a daily email to my extended family which has spurred some interesting discussions.</p><h3><strong>Shareholder activism</strong></h3><p><strong>Askeladden Capital, a long-only small/micro-cap value investing firm, <a href="https://askeladdencapital.com/wp-content/uploads/2025/10/2025-08-10-Askeladden-Capital-Q2-2024-Letter-10X.pdf">used LLMs in a proxy battle</a></strong></p><blockquote><p><em>During our proxy contest at AstroNova, AI tools helped us produce the <a href="https://www.streetinsider.com/SEC+Filings/Form+DFAN14A+AstroNova%2C+Inc.+Filed+by%3A+ASKELADDEN+CAPITAL+MANAGEMENT+LLC/24925927.html">extensive ISS deck</a> and other materials that would have been vastly more costly to prepare otherwise (i.e., we would have needed to spend an incremental six figures, and/or developed far lower quality work-product). We earned endorsements from both ISS and Glass Lewis, and the incumbent board requested the CEO&#8217;s resignation. I believe, though it is of course impossible to verify, that we may well have run the first AI-powered proxy contest in history.</em></p></blockquote><p>My take: LLMs should enable more proxy contests against smaller companies that previously weren&#8217;t &#8216;worth it.&#8217; One challenge is the eval on something like this, given these are relatively infrequent and the feedback loop is so long. Will most likely be leveraged by advisory firms and investors with direct experience creating these materials.</p><h3><strong>Expert interviews</strong></h3><p><strong>CEO of primary research firm, Kane &amp; Company, <a href="https://www.linkedin.com/posts/mark-kane-jr_before-talking-to-experts-we-run-ai-queries-activity-7409235333550153728-Rnoa/">uses LLMs to drive more meaningful diligence conversations</a></strong></p><blockquote><p><em>Before talking to experts, we run AI queries on the market and competitive landscape. This gives us a baseline of what public sources and models already know.</em></p><p><em>The baseline serves two purposes. It helps us write better questions because we know the common answers and can push past them.</em></p><p><em>It also becomes our quality control filter. When an expert&#8217;s answer matches ChatGPT&#8217;s output too closely, we flag it.</em></p><p><em>Say for example we asked ChatGPT about the Canadian IT outsourcing market before starting expert calls. It tells us growth was 18 percent. An expert later gave us the exact same number with the same framing. We know to ask where that figure came from and what assumptions drove it.</em></p></blockquote><p>My take: one challenge with expert networks are &#8220;professional experts&#8221; - folks who make their living doing calls and are too far away from actual industry to offer true insights. It&#8217;s helpful to get a few reps in first with ChatGPT, which is often regurgitating the same content these folks read anyway. Then you&#8217;ll be better positioned to push past.</p><h3><strong>Expert interviews (again): </strong></h3><p><strong>An AI marketing firm <a href="https://www.99ravens.agency/resources/blogs/your-experts-wont-train-your-ai-you-have-to-interview-them/">built an LLM system that interviews experts automatically</a></strong></p><blockquote><p><em>The only way to capture true expertise is to build an AI interviewer that earns the trust to be seen as a peer. An equal. One that asks questions so insightful the expert reveals the distinctive methodologies they&#8217;d normally only share with another seasoned professional. That is the real technical hurdle. Here is how we cleared it. &#8230;</em></p><p><em>The Note-Taker is an internal tool, invisible to the expert, that continuously analyzes the conversation. The Interviewer queries it for structured progress reports.</em></p><p><em>What the Note-Taker Tracks:</em></p><p><em>Coverage Analysis: Topics explored with confidence levels (high/medium/low).</em></p><p><em>Gap Identification: Required areas not yet addressed, prioritized by importance.</em></p><p><em>Time Status: Pacing assessment against the target duration, with wrap-up triggers.</em></p><p><em>Pattern Detection: Emerging themes, contradictions, or when the expert defaults to generic &#8220;best practices.&#8221;</em></p><p><em>Next Action: A specific suggestion for the next probe.</em></p><p><em>The key insight is that the Note-Taker returns structured data, not prose. This prevents the Interviewer from getting confused by a second voice. If the Note-Taker flags a gap in &#8220;decision-making frameworks,&#8221; the Interviewer integrates the suggestion naturally: &#8220;You mentioned evaluating channels&#8212;walk me through a recent decision where you chose not to invest somewhere.&#8221;</em></p></blockquote><p>My take: I think we&#8217;re still a ways away from LLMs running effective investor style expert network interviews. The example in this post is for executives who want to be interviewed, and I think the best investment insights come from human run interviews, ideally unpaid ones with trusted relationships. However, the AI notetaker seems a very useful tool for human interviewers. Could enable folks who are very good at lining up phone calls with experts to conduct them, instead of having to be conducted by expert analysts.</p><h3>Follow for more case studies of LLMs+Investment Research</h3><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.flatcircle.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.flatcircle.ai/subscribe?"><span>Subscribe now</span></a></p><p><em>If you have any interesting examples or would like to discuss incorporating LLMs into your research process, reply to this email or reach out via <a href="https://x.com/jdmoran">X</a> or <a href="https://www.linkedin.com/in/jamesdmoran/">LinkedIn</a>.</em></p>]]></content:encoded></item><item><title><![CDATA[How well to LLMs forecast company KPIs?]]></title><description><![CDATA[Benchmarking forecasting models on the hedge fund use case]]></description><link>https://blog.flatcircle.ai/p/introducing-the-flat-circle-arena</link><guid isPermaLink="false">https://blog.flatcircle.ai/p/introducing-the-flat-circle-arena</guid><dc:creator><![CDATA[Jim Moran]]></dc:creator><pubDate>Thu, 16 Oct 2025 04:09:09 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!fWPI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9731944c-a4a6-42b9-934c-acefd6b6e40f_1008x742.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>There&#8217;s a new trend of AI forecasting arenas like <a href="https://www.metaculus.com/aib/">Metaculus AIB</a> and <a href="https://futurex-ai.github.io/">FutureX</a>, where LLM engineers compete to forecast a broad range of future events. </p><p>We analyzed how the the top forecasting models on a discretionary hedge fund workflow. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://flatcircle.ai" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fWPI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9731944c-a4a6-42b9-934c-acefd6b6e40f_1008x742.jpeg 424w, https://substackcdn.com/image/fetch/$s_!fWPI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9731944c-a4a6-42b9-934c-acefd6b6e40f_1008x742.jpeg 848w, https://substackcdn.com/image/fetch/$s_!fWPI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9731944c-a4a6-42b9-934c-acefd6b6e40f_1008x742.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!fWPI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9731944c-a4a6-42b9-934c-acefd6b6e40f_1008x742.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fWPI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9731944c-a4a6-42b9-934c-acefd6b6e40f_1008x742.jpeg" width="1008" height="742" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9731944c-a4a6-42b9-934c-acefd6b6e40f_1008x742.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:742,&quot;width&quot;:1008,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:118967,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:&quot;https://flatcircle.ai&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.flatcircle.ai/i/176292980?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9731944c-a4a6-42b9-934c-acefd6b6e40f_1008x742.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!fWPI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9731944c-a4a6-42b9-934c-acefd6b6e40f_1008x742.jpeg 424w, https://substackcdn.com/image/fetch/$s_!fWPI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9731944c-a4a6-42b9-934c-acefd6b6e40f_1008x742.jpeg 848w, https://substackcdn.com/image/fetch/$s_!fWPI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9731944c-a4a6-42b9-934c-acefd6b6e40f_1008x742.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!fWPI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9731944c-a4a6-42b9-934c-acefd6b6e40f_1008x742.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>History of forecasting</strong></p><p>The forecasting community was first popularized two decades ago by <a href="https://en.wikipedia.org/wiki/Philip_E._Tetlock">Prof Philip Tetlock</a>&#8217;s popular book <a href="https://www.amazon.com/Superforecasting-Science-Prediction-Philip-Tetlock/dp/0804136718">Superforecasting</a>. Competitions, like the <a href="https://www.gjopen.com/">Good Judgement Open</a> and <a href="https://www.metaculus.com/">Metaculus</a>, are months long contests where forecasting professionals predict the outcomes of various events that will resolve in the coming months such as elections, sports championships, weather and wars.</p><p>Recently, the community has started to accelerate as forecasters use LLMs to automate some of the manual research and calculation steps. Estimates that used to take hours or days can now be done in seconds. Now forecasting expertise is being channeled into specialty LLM system design so the marginal forecast can be automated. </p><p>This explains why we&#8217;re seeing these new LLM forecasting arenas thousands of individuals and teams competing in them. Historically, hedge funds haven&#8217;t had much overlap with the forecasting community - too many questions requiring too much specialized knowledge, with answers needed in too short of a time horizon. </p><p>We think this is about to change as LLM forecasting systems can provide answers in real time. </p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.flatcircle.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.flatcircle.ai/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item><item><title><![CDATA[Flat Circle - How Claude 3.7 makes better investment decisions]]></title><description><![CDATA[Plus: 1 new research paper, 3 new articles and 4 new hedge fund LLM jobs]]></description><link>https://blog.flatcircle.ai/p/flat-circle-how-claude-37-makes-better</link><guid isPermaLink="false">https://blog.flatcircle.ai/p/flat-circle-how-claude-37-makes-better</guid><dc:creator><![CDATA[Jim Moran]]></dc:creator><pubDate>Tue, 25 Feb 2025 18:18:28 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!pmL5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feef1a09e-9b49-40ac-b276-40bd82c36e35_2600x2360.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Flat Circle measures the ability of language models to predict company earnings results. See our <a href="https://www.flatcircle.ai/p/flat-circle-llm-benchmark-methodology">methodology</a> for detail and disclaimers. If you haven&#8217;t already subscribed, join investors and engineers interested in LLMs+investment research here:</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.flatcircle.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.flatcircle.ai/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h2>Claude 3.7 and 3.5 make different trading decisions given the same information</h2><p>Yesterday, <a href="https://www.anthropic.com/news/claude-3-7-sonnet">Anthropic released Claude 3.7 Sonnet</a> - which shows superior reasoning scores to OpenAI o1 and DeepSeek R1. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.anthropic.com/news/claude-3-7-sonnet" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pmL5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feef1a09e-9b49-40ac-b276-40bd82c36e35_2600x2360.webp 424w, https://substackcdn.com/image/fetch/$s_!pmL5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feef1a09e-9b49-40ac-b276-40bd82c36e35_2600x2360.webp 848w, https://substackcdn.com/image/fetch/$s_!pmL5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feef1a09e-9b49-40ac-b276-40bd82c36e35_2600x2360.webp 1272w, https://substackcdn.com/image/fetch/$s_!pmL5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feef1a09e-9b49-40ac-b276-40bd82c36e35_2600x2360.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pmL5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feef1a09e-9b49-40ac-b276-40bd82c36e35_2600x2360.webp" width="1456" height="1322" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/eef1a09e-9b49-40ac-b276-40bd82c36e35_2600x2360.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1322,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Benchmark table comparing frontier reasoning models&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://www.anthropic.com/news/claude-3-7-sonnet&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Benchmark table comparing frontier reasoning models" title="Benchmark table comparing frontier reasoning models" srcset="https://substackcdn.com/image/fetch/$s_!pmL5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feef1a09e-9b49-40ac-b276-40bd82c36e35_2600x2360.webp 424w, https://substackcdn.com/image/fetch/$s_!pmL5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feef1a09e-9b49-40ac-b276-40bd82c36e35_2600x2360.webp 848w, https://substackcdn.com/image/fetch/$s_!pmL5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feef1a09e-9b49-40ac-b276-40bd82c36e35_2600x2360.webp 1272w, https://substackcdn.com/image/fetch/$s_!pmL5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feef1a09e-9b49-40ac-b276-40bd82c36e35_2600x2360.webp 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Last night, we fed Claude 3.7 and Claude 3.5 the same information about 10 companies scheduled to report earnings this morning and asked both to make BUY/SELL calls for each stock. </p><p>Claude 3.7 Sonnet offers an &#8220;extended thinking mode&#8221; allowing the user to specify a budget for the number of reasoning tokens it should use while formulating a response. While we maxed out the reasoning budget, there is a limit to how high the budget can be set and the model often doesn&#8217;t use its entire budget. Each call used only about 4 cents worth of reasoning tokens, though you can imagine a future with investors spending 4 million dollars on a single investment decision.</p><p>Across the 10 names, both Claude 3.7 and 3.5 lost money because they incorrectly recommended BUY for Sempra (SRE), which missed on revenue and guidance and is currently down ~20%.  </p><p><strong>Overall, Claude 3.7 outperformed Claude 3.5 by about 10% thanks to 3.7 correctly calling BUY for Bank of Montreal (BMO) vs Claude 3.5 recommending SELL. </strong>BMO beat earnings and is currently up about 5%.</p><p>Both generations of the Sonnet model break their reasoning steps into similar concepts, but the newer generation appears to think harder and more deeply. We&#8217;ve contrasted the two models&#8217; reasoning on BMO below:</p><h4>Credit cycle observations</h4><p><strong>Claude 3.5</strong></p><pre><code>Credit Quality Deterioration:
- In Q4 2024, BMO reported significantly higher provisions for credit losses (PCL) of $1,523M vs $446M year-over-year
- Management noted credit performance deteriorated more than anticipated throughout 2024
- While Q4 2024 may represent a "high point" for provisions, they are expected to "remain elevated" through 2025</code></pre><p><strong>Claude 3.7 - extended thinking mode</strong></p><pre><code>Credit Loss Cycle Appears to Have Peaked
- In Q4 2024, management explicitly stated that Q4 represented a "high point" for provisions for credit losses (PCLs) that would "begin to moderate through 2025"
- The bank has already recognized substantial credit losses in 2024 (PCLs reached $1,523M in Q4 alone)
- Interest rate cuts have begun, which should gradually reduce pressure on borrowers
- The early recognition of credit issues and proactive provisioning positions BMO well for improving credit metrics
- Prior credit cycles show the stock typically recovers strongly once problems are fully recognized and addressed</code></pre><h4><strong>Peer commentary</strong></h4><p><strong>Claude 3.5</strong></p><pre><code>Challenging Operating Environment:
- Peer commentary (RBC, Scotiabank) indicates continued pressure on Canadian consumers from higher rates and rising unemployment
- Economic growth is expected to remain subdued in H1 2025
- Competitive pressure on deposits remains high, potentially squeezing margins
- Bank of Canada rate cuts may provide some relief but timing remains uncertain</code></pre><p><strong>Claude 3.7 - extended thinking mode</strong></p><pre><code>Beneficial Rate Environment Emerging
- Central banks have begun easing cycles, with more cuts expected in 2025
- BMO management specifically noted this should be positive for both economic activity and credit performance
- Peer commentary (RBC, BNS) supports expectations for mortgage market improvement and broader economic stabilization</code></pre><h4><strong>Analyzing recent performance</strong></h4><p><strong>Claude 3.5</strong></p><pre><code>Recent Performance vs Expectations:
- BMO has missed consensus estimates for three consecutive quarters
- Q4 2024 adjusted EPS declined to $1.90 from $2.93 year-over-year
- Return on Equity has fallen below target with management acknowledging it will take time to rebuild to 15% medium-term goal</code></pre><p><strong>Claude 3.7 - extended thinking mode</strong></p><pre><code>Capital Strength Enabling Shareholder Returns
- BMO has built exceptional capital strength with CET1 ratio reaching 13.6% in Q4 2024
- The bank announced both a dividend increase and a share repurchase program of up to 20 million shares (NCIB)
- January 2025 press releases confirm regulatory approvals have been received and the buyback is proceeding
- This capital return story provides support even if operating results are mixed

Positive Momentum in Core Businesses
- Q4 2024 delivered record pre-provision pre-tax earnings of $3.5 billion (up 8% YoY)
- Canadian P&amp;C business has shown consistent strength with record revenues
- U.S. segment's contribution has significantly increased post-Bank of the West integration
- Management indicated "net bullish" outlook for U.S. growth prospects in 2025</code></pre><p>Claude 3.7&#8217;s observations seem to be more forward looking. Apparently this is what better investment reasoning looks like. </p><h2><strong>Interesting articles</strong></h2><p><strong>New paper details trading system based on LLMs + reinforcement learning</strong>. Authors incorporate an LLM monitoring for changes in market sentiment to overcome the structured data limitations of traditional RL based trading strategies. </p><p>The paper compares results to o1, GPT 4o and other open source models, and corroborates our conclusions that <a href="https://www.flatcircle.ai/p/flat-circle-contrasting-good-vs-poor">o1 outperforms other models</a>. However, all models appear to be beaten by the RL-LLM hybrid system discussed in this paper (<a href="https://arxiv.org/abs/2502.11433v3">arXiv</a>)</p><p><strong>Is AI really thinking or just pretending to?</strong> This is really the key question, and the article lays out the arguments on both sides. One good quote:</p><blockquote><p><em>The best use case is a situation where it&#8217;s hard for you to come up with a solution, but once you get a solution from the AI you can easily check to see if it&#8217;s correct. Writing code is a perfect example. Another example would be making a website: You can see what the AI produced and, if you don&#8217;t like it, just get the AI to redo it.</em></p></blockquote><p>&#8230; another example is measuring how the models perform in the market (<a href="https://www.vox.com/future-perfect/400531/ai-reasoning-models-openai-deepseek">Vox</a>)</p><p><strong>Two articles from late last year about Balyasny&#8217;s internal LLM tool:</strong></p><ul><li><p>Balyasny&#8217;s AI outperforms OpenAI in financial applications (<a href="https://www.hedgeweek.com/balyasnys-ai-outperforms-openai-in-financial-applications/">hedgeweek</a>)  </p></li><li><p>A day in the life of an applied AI engineer at Balyasny (<a href="http://A day in the life of an applied AI engineer at hedge fund Balyasny">efinancialcareers</a>)</p></li></ul><h2>Interesting LLM hedge fund job descriptions</h2><p><strong>Citadel: <a href="https://www.glassdoor.co.uk/job-listing/commodities-machine-learning-engineer-citadel-enterprise-americas-JV_IC2671300_KO0,37_KE38,65.htm?jl=1009553262036">Commodities - Machine Learning Engineer</a></strong> </p><blockquote><p><em>&#8220;Commodities have undergone an information revolution. From ship tracking to oil storage levels and crop yields, more data on supply, demand, storage, and transport is available than ever before. Commodity markets are more globally connected: natural gas markets impact fertilizer production, while agricultural markets impact gasoline production&#8230;We combine specialist domain expertise with advanced modeling techniques to solve problems that others deem unsolvable.&#8221;</em></p></blockquote><p><strong>DE Shaw: <a href="https://www.deshaw.com/careers/software-developer-generative-ai-5375">Software Developer  - Generative AI</a> ($225K)</strong></p><blockquote><p><em>&#8220;Working on greenfield projects, which offer opportunities to shape the future of GAI at the firm and make a significant impact&#8221;</em></p></blockquote><p><strong>Millennium: <a href="https://aijobs.ai/job/senior-ai-engineer-equities-technology">Senior AI Engineer - Equities Technology</a> ($213K)</strong></p><blockquote><p><em>&#8220;We are building the next generation of Large Language Modeling applications driven by Portfolio Manager's requirements that provide immediate value and scale as a core product.&#8221;</em></p></blockquote><p><strong>Point72 (Cubist Systematics): <a href="https://job-boards.greenhouse.io/point72/jobs/7825077002">NLP Engineer</a></strong></p><blockquote><p><em>&#8220;Build start-of-the-art deep learning models to process large scale unstructured datasets.&#8221;</em></p></blockquote><h2>Follow how LLMs are beginning to make investment decisions</h2><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.flatcircle.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.flatcircle.ai/subscribe?"><span>Subscribe now</span></a></p><p><em>If you have feedback or would like to participate in this project, please reply to this email or reach out via <a href="https://x.com/jdmoran">X</a> or <a href="https://www.linkedin.com/in/jamesdmoran/">LinkedIn</a>.</em></p>]]></content:encoded></item><item><title><![CDATA[Flat Circle - Contrasting good vs poor reasoning]]></title><description><![CDATA[Plus: Grok and o1 share the lead, 10 billion times more compute, more deep researchers]]></description><link>https://blog.flatcircle.ai/p/flat-circle-contrasting-good-vs-poor</link><guid isPermaLink="false">https://blog.flatcircle.ai/p/flat-circle-contrasting-good-vs-poor</guid><dc:creator><![CDATA[Jim Moran]]></dc:creator><pubDate>Thu, 20 Feb 2025 22:13:24 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!JkBV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b3c6635-a5ea-4ddd-8168-bd16ad3c27d3_1402x1002.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Flat Circle measures the ability of language models to predict company earnings results. See our <a href="https://www.flatcircle.ai/p/flat-circle-llm-benchmark-methodology">methodology</a> for detail and disclaimers. If you haven&#8217;t already subscribed, join investors and engineers interested in LLMs+investment research here:</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.flatcircle.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.flatcircle.ai/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h2>Key takeaways</h2><ul><li><p><strong>After 344 earnings, Grok-2 and o1 share the lead with a ~54% hitrate, earning ~1.1% per earnings</strong></p></li><li><p><strong>We contrast the reasoning approach of stronger vs weaker models regarding shareholder lawsuits</strong></p></li><li><p><strong>Perplexity and Grok announced their own competitors to ChatGPT Deep Research</strong></p><ul><li><p>Agentic research systems create another dimension on which LLMs may compete with human investors: they could either reason better or research better</p></li><li><p>Since Deep Researchers search online and can access historical information, it&#8217;s impossible to backtest their ability to make investment decisions. You need to test them live</p></li></ul></li></ul><h2>Model accuracy</h2><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Pp8i!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffeb9385d-f185-4f55-8c91-1673fca33514_951x237.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Pp8i!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffeb9385d-f185-4f55-8c91-1673fca33514_951x237.png 424w, https://substackcdn.com/image/fetch/$s_!Pp8i!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffeb9385d-f185-4f55-8c91-1673fca33514_951x237.png 848w, https://substackcdn.com/image/fetch/$s_!Pp8i!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffeb9385d-f185-4f55-8c91-1673fca33514_951x237.png 1272w, https://substackcdn.com/image/fetch/$s_!Pp8i!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffeb9385d-f185-4f55-8c91-1673fca33514_951x237.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Pp8i!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffeb9385d-f185-4f55-8c91-1673fca33514_951x237.png" width="951" height="237" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/feb9385d-f185-4f55-8c91-1673fca33514_951x237.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:237,&quot;width&quot;:951,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:38374,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.flatcircle.ai/i/157102539?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffeb9385d-f185-4f55-8c91-1673fca33514_951x237.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Pp8i!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffeb9385d-f185-4f55-8c91-1673fca33514_951x237.png 424w, https://substackcdn.com/image/fetch/$s_!Pp8i!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffeb9385d-f185-4f55-8c91-1673fca33514_951x237.png 848w, https://substackcdn.com/image/fetch/$s_!Pp8i!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffeb9385d-f185-4f55-8c91-1673fca33514_951x237.png 1272w, https://substackcdn.com/image/fetch/$s_!Pp8i!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffeb9385d-f185-4f55-8c91-1673fca33514_951x237.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!JkBV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b3c6635-a5ea-4ddd-8168-bd16ad3c27d3_1402x1002.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JkBV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b3c6635-a5ea-4ddd-8168-bd16ad3c27d3_1402x1002.png 424w, https://substackcdn.com/image/fetch/$s_!JkBV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b3c6635-a5ea-4ddd-8168-bd16ad3c27d3_1402x1002.png 848w, https://substackcdn.com/image/fetch/$s_!JkBV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b3c6635-a5ea-4ddd-8168-bd16ad3c27d3_1402x1002.png 1272w, https://substackcdn.com/image/fetch/$s_!JkBV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b3c6635-a5ea-4ddd-8168-bd16ad3c27d3_1402x1002.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JkBV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b3c6635-a5ea-4ddd-8168-bd16ad3c27d3_1402x1002.png" width="1402" height="1002" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6b3c6635-a5ea-4ddd-8168-bd16ad3c27d3_1402x1002.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1002,&quot;width&quot;:1402,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:143263,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.flatcircle.ai/i/157102539?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b3c6635-a5ea-4ddd-8168-bd16ad3c27d3_1402x1002.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!JkBV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b3c6635-a5ea-4ddd-8168-bd16ad3c27d3_1402x1002.png 424w, https://substackcdn.com/image/fetch/$s_!JkBV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b3c6635-a5ea-4ddd-8168-bd16ad3c27d3_1402x1002.png 848w, https://substackcdn.com/image/fetch/$s_!JkBV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b3c6635-a5ea-4ddd-8168-bd16ad3c27d3_1402x1002.png 1272w, https://substackcdn.com/image/fetch/$s_!JkBV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b3c6635-a5ea-4ddd-8168-bd16ad3c27d3_1402x1002.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Are all models converging on a 50/50 coinflip? 2 standard deviations away from the mean of a random coinflipper is 55% after 345 flips, and it seems worrisome the hitrate has been trending down. </p><p>However, the mean profit per earnings remains very strong for o1 (113 bps) and Grok-2 (107 bps). o1 is by far the most advanced according to other benchmarks and by far the most expensive in terms of tokens, seems promising it&#8217;s earning the most. We&#8217;ll see.</p><h2>Example of good vs poor reasoning?</h2><p>This project provides a way to quantify correct vs incorrect investor judgement. As an example, PGY recently reported better than expected revenue and margins and provided overall certainty around its balance sheet. </p><p>Gemini Flash Thinking, which loses -1.1% per earnings, had issued a SELL:</p><blockquote><pre><code>The emergence of securities fraud investigations significantly overshadows any positive historical trends or recent ABS deal announcements. While Pagaya has shown a pattern of exceeding guidance and achieving strong growth, the serious nature of securities fraud allegations creates substantial uncertainty and risk.  </code></pre></blockquote><p>We feed the models recent press releases, and apparently Gemini Flash Thinking had fixated on <a href="https://www.kxan.com/business/press-releases/accesswire/983536/investor-alert-pagaya-technologies-stock-drops-on-short-seller-report-investors-with-losses-encouraged-to-contact-kehoe-law-firm-p-c-pgy/">shareholder lawsuits</a> related to a <a href="https://iceberg-research.com/2025/02/11/pagaya-using-other-peoples-money-to-hide-massive-losses/">short seller report</a>.</p><p>o1, which earns 1.1% per earnings, was fed the same information - it accounted for the shareholder suits but concluded BUY:</p><blockquote><pre><code>While the wave of shareholder lawsuits and the Q3 share&#8208;price plunge indicate heightened uncertainty, the company&#8217;s expanding network of top-tier lending partners (including new bank and auto relationships) and its successful capital raises suggest solid underlying demand. In addition, management&#8217;s repeated emphasis on profitability in 2025, along with evidence of improving loan performance and rising fee margins, points to a business now positioned for stronger operating leverage as interest rates stabilize. With shares trading well off recent highs yet fundamentals and guidance still largely intact, the risk/reward profile appears favorable going into the February earnings release&#8212;making PGY a buy.</code></pre></blockquote><p>We don&#8217;t specify in the prompt whether to ignore shareholder lawsuit press releases. The returns teach us which reasoning is correct.</p><h2>Interesting articles</h2><p><strong>The amount of compute per request is going to skyrocket</strong>. Major implications for datacenters and hyperscalers, and also what can and will be spent on investment decisions:</p><blockquote><p><em>&#8220;&#8230;this single process from a single human interaction would involve 10 billion times more compute than a single human writing into ChatGPT today, at the exact same model size. That is the incredible expansion dynamic in inference compute that is playing out today and over the next few years!&#8221;</em></p></blockquote><p>(See &#8220;Inference Compute Scaling&#8221; on <a href="https://attune-ai.com/research">Attune Research</a>)</p><p><strong>Extensive thread on using ChatGPT Deep Research to create an investment thesis around DoorDash (DASH).</strong> Lots of great detail, I particularly like the multiple rounds with ChatGPT to create the optimal prompt:</p><blockquote><p><em>&#8220;I asked ChatGPT to build me a prompt for Deep Research to do Deep Research on Deep Research prompting. It read all the blogs and literature on best practices and gave me a thorough report. Then I asked for this to be turned into a prompt template for Deep Research. I've added it below. This routinely creates 3-5 page prompts that are generating 60-100 page, very thorough reports&#8221;</em></p></blockquote><p>(<a href="https://x.com/buccocapital/status/1891473002639868282">@BuccoCapital</a> on X)</p><p><strong>Grok-3 with DeepSearch announced</strong> (<a href="https://techcrunch.com/2025/02/17/elon-musks-ai-company-xai-releases-its-latest-flagship-ai-grok-3/">Techcrunch</a>)</p><p><strong>Perplexity launches Deep Research</strong> (<a href="https://www.perplexity.ai/hub/blog/introducing-perplexity-deep-research">Perplexity</a>)</p><p><strong>Hedge Fund that replaced analysts with AI beat the market</strong> (<a href="https://archive.is/6OoL2">Bloomberg</a>)</p><h2>Follow the progress of LLM investment research</h2><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.flatcircle.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.flatcircle.ai/subscribe?"><span>Subscribe now</span></a></p><p><em>If you have feedback or would like to participate in this project, please reply to this email or reach out via <a href="https://x.com/jdmoran">X</a> or <a href="https://www.linkedin.com/in/jamesdmoran/">LinkedIn</a>.</em></p>]]></content:encoded></item><item><title><![CDATA[Flat Circle - o1 now best performing model]]></title><description><![CDATA[Plus: the models are correlated, Deep Research + Deep Research]]></description><link>https://blog.flatcircle.ai/p/flat-circle-o1-now-best-performing</link><guid isPermaLink="false">https://blog.flatcircle.ai/p/flat-circle-o1-now-best-performing</guid><dc:creator><![CDATA[Jim Moran]]></dc:creator><pubDate>Tue, 11 Feb 2025 18:32:56 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!g1jb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4799598a-7f89-4c88-b83f-4fdef89ee391_999x308.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Flat Circle measures the ability of language models to predict company earnings results. See our <a href="https://www.flatcircle.ai/p/flat-circle-llm-benchmark-methodology">methodology</a> for detail and disclaimers. If you haven&#8217;t already subscribed, join investors and engineers interested in LLMs+investment research here:</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.flatcircle.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.flatcircle.ai/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h2>Key Takeaways</h2><ul><li><p><strong>After 209 live earnings, o1 now leads with a 57% hitrate and 130 bps mean return per earnings, followed by Grok-2 at a 55% hitrate</strong></p><ul><li><p>57% is approximately 2 standard deviations away from random chance after 209 coinflips</p></li><li><p>The Gemini and Claude models appear to be approaching 50/50</p></li></ul></li><li><p><strong>The models are fairly correlated with each other, tending to make the same calls</strong></p><ul><li><p>We&#8217;ll have to figure out ways to ensure model orthogonality before institutions start adopting LLMs to make investment decisions</p></li></ul></li><li><p><strong>Recently spoken with a large number of readers and appreciate the helpful feedback</strong></p><ul><li><p>In addition to reporting on the leading language models and their ability to call company earnings, I plan to include other resources and news relevant to LLMs+investing</p></li></ul></li></ul><h2>Model correlation</h2><p>While the models show differing abilities, they are fairly correlated. LLMs are somewhat more likely to issue the same calls than if each model were merely flipping coins. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!g1jb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4799598a-7f89-4c88-b83f-4fdef89ee391_999x308.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!g1jb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4799598a-7f89-4c88-b83f-4fdef89ee391_999x308.png 424w, https://substackcdn.com/image/fetch/$s_!g1jb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4799598a-7f89-4c88-b83f-4fdef89ee391_999x308.png 848w, https://substackcdn.com/image/fetch/$s_!g1jb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4799598a-7f89-4c88-b83f-4fdef89ee391_999x308.png 1272w, https://substackcdn.com/image/fetch/$s_!g1jb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4799598a-7f89-4c88-b83f-4fdef89ee391_999x308.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!g1jb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4799598a-7f89-4c88-b83f-4fdef89ee391_999x308.png" width="999" height="308" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4799598a-7f89-4c88-b83f-4fdef89ee391_999x308.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:308,&quot;width&quot;:999,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:96516,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!g1jb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4799598a-7f89-4c88-b83f-4fdef89ee391_999x308.png 424w, https://substackcdn.com/image/fetch/$s_!g1jb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4799598a-7f89-4c88-b83f-4fdef89ee391_999x308.png 848w, https://substackcdn.com/image/fetch/$s_!g1jb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4799598a-7f89-4c88-b83f-4fdef89ee391_999x308.png 1272w, https://substackcdn.com/image/fetch/$s_!g1jb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4799598a-7f89-4c88-b83f-4fdef89ee391_999x308.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This makes sense as they likely share much of the same training data, technology and methodology. </p><p>This also means the <a href="https://www.flatcircle.ai/p/flat-circle-are-we-merely-flipping">prior basis of comparison of 5 models flipping coins</a> was overly strict as we no longer talking about 5 independent models. After 209 earnings, o1&#8217;s hitrate is 57%. 2 standard deviations above the mean of a single coin flipper is 57% vs. 59% for 5 coin flippers.</p><p>I can&#8217;t deduce any patterns among why certain models are more or less correlated to others. Only thing I see is the two newest &#8216;reasoning&#8217; focused models, o1 and Gemini Flash Thinking, appear least correlated with others. We&#8217;ll see if this trend continues.</p><p>Models&#8217; orthogonality, the extent to which they are uncorrelated, will be a crucial dimension on capital allocators&#8217; decisions to use them for trading decisions. Orthogonality across managers and market factors is essential for risk management and leverage.</p><h2>Model accuracy</h2><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Igjb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc312bab-b02d-4651-864f-f40e979602dc_945x231.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Igjb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc312bab-b02d-4651-864f-f40e979602dc_945x231.png 424w, https://substackcdn.com/image/fetch/$s_!Igjb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc312bab-b02d-4651-864f-f40e979602dc_945x231.png 848w, https://substackcdn.com/image/fetch/$s_!Igjb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc312bab-b02d-4651-864f-f40e979602dc_945x231.png 1272w, https://substackcdn.com/image/fetch/$s_!Igjb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc312bab-b02d-4651-864f-f40e979602dc_945x231.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Igjb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc312bab-b02d-4651-864f-f40e979602dc_945x231.png" width="945" height="231" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cc312bab-b02d-4651-864f-f40e979602dc_945x231.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:231,&quot;width&quot;:945,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:38941,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!Igjb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc312bab-b02d-4651-864f-f40e979602dc_945x231.png 424w, https://substackcdn.com/image/fetch/$s_!Igjb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc312bab-b02d-4651-864f-f40e979602dc_945x231.png 848w, https://substackcdn.com/image/fetch/$s_!Igjb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc312bab-b02d-4651-864f-f40e979602dc_945x231.png 1272w, https://substackcdn.com/image/fetch/$s_!Igjb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc312bab-b02d-4651-864f-f40e979602dc_945x231.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RjcA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc32de8c0-1488-48f1-af9a-f76e9c879782_1400x996.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RjcA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc32de8c0-1488-48f1-af9a-f76e9c879782_1400x996.png 424w, https://substackcdn.com/image/fetch/$s_!RjcA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc32de8c0-1488-48f1-af9a-f76e9c879782_1400x996.png 848w, https://substackcdn.com/image/fetch/$s_!RjcA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc32de8c0-1488-48f1-af9a-f76e9c879782_1400x996.png 1272w, https://substackcdn.com/image/fetch/$s_!RjcA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc32de8c0-1488-48f1-af9a-f76e9c879782_1400x996.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RjcA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc32de8c0-1488-48f1-af9a-f76e9c879782_1400x996.png" width="1400" height="996" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c32de8c0-1488-48f1-af9a-f76e9c879782_1400x996.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:996,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:151633,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!RjcA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc32de8c0-1488-48f1-af9a-f76e9c879782_1400x996.png 424w, https://substackcdn.com/image/fetch/$s_!RjcA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc32de8c0-1488-48f1-af9a-f76e9c879782_1400x996.png 848w, https://substackcdn.com/image/fetch/$s_!RjcA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc32de8c0-1488-48f1-af9a-f76e9c879782_1400x996.png 1272w, https://substackcdn.com/image/fetch/$s_!RjcA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc32de8c0-1488-48f1-af9a-f76e9c879782_1400x996.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Comparison to other benchmarks</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BNKv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8ce070b-fefe-4d3d-9f86-f405339d56ee_512x289.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BNKv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8ce070b-fefe-4d3d-9f86-f405339d56ee_512x289.png 424w, https://substackcdn.com/image/fetch/$s_!BNKv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8ce070b-fefe-4d3d-9f86-f405339d56ee_512x289.png 848w, https://substackcdn.com/image/fetch/$s_!BNKv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8ce070b-fefe-4d3d-9f86-f405339d56ee_512x289.png 1272w, https://substackcdn.com/image/fetch/$s_!BNKv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8ce070b-fefe-4d3d-9f86-f405339d56ee_512x289.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BNKv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8ce070b-fefe-4d3d-9f86-f405339d56ee_512x289.png" width="512" height="289" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a8ce070b-fefe-4d3d-9f86-f405339d56ee_512x289.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:289,&quot;width&quot;:512,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:40699,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BNKv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8ce070b-fefe-4d3d-9f86-f405339d56ee_512x289.png 424w, https://substackcdn.com/image/fetch/$s_!BNKv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8ce070b-fefe-4d3d-9f86-f405339d56ee_512x289.png 848w, https://substackcdn.com/image/fetch/$s_!BNKv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8ce070b-fefe-4d3d-9f86-f405339d56ee_512x289.png 1272w, https://substackcdn.com/image/fetch/$s_!BNKv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8ce070b-fefe-4d3d-9f86-f405339d56ee_512x289.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>How does a model&#8217;s ability to call earnings compare to standard LLM benchmarks? We take each model&#8217;s reasoning score from <a href="https://livebench.ai/">LiveBench</a>, and compare it with their hitrate and mean share price return on their ability to call live earnings.</p><p><strong>The models with the best share price return are those with the highest and lowest reasoning scores. The models in the middle underperform.</strong></p><blockquote><p>&#8220;A lot of smart people think they&#8217;re way smarter than they are, and therefore they do worse than dumb people&#8221; - Charlie Munger</p></blockquote><h2>Upcoming earnings calls</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!syqb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b7fd8e0-5ed2-41d5-a6d1-58a5bf11653c_896x512.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!syqb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b7fd8e0-5ed2-41d5-a6d1-58a5bf11653c_896x512.png 424w, https://substackcdn.com/image/fetch/$s_!syqb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b7fd8e0-5ed2-41d5-a6d1-58a5bf11653c_896x512.png 848w, https://substackcdn.com/image/fetch/$s_!syqb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b7fd8e0-5ed2-41d5-a6d1-58a5bf11653c_896x512.png 1272w, https://substackcdn.com/image/fetch/$s_!syqb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b7fd8e0-5ed2-41d5-a6d1-58a5bf11653c_896x512.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!syqb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b7fd8e0-5ed2-41d5-a6d1-58a5bf11653c_896x512.png" width="896" height="512" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8b7fd8e0-5ed2-41d5-a6d1-58a5bf11653c_896x512.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:512,&quot;width&quot;:896,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:75733,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!syqb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b7fd8e0-5ed2-41d5-a6d1-58a5bf11653c_896x512.png 424w, https://substackcdn.com/image/fetch/$s_!syqb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b7fd8e0-5ed2-41d5-a6d1-58a5bf11653c_896x512.png 848w, https://substackcdn.com/image/fetch/$s_!syqb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b7fd8e0-5ed2-41d5-a6d1-58a5bf11653c_896x512.png 1272w, https://substackcdn.com/image/fetch/$s_!syqb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b7fd8e0-5ed2-41d5-a6d1-58a5bf11653c_896x512.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I&#8217;m debating whether to continue listing these upcoming earnings calls at all, since the hitrates are so close to 50%. Even the hitrates where BUY or SELL calls are unanimous aren&#8217;t meaningfully more predictive. If these calls are valuable or you would like a different display of them, please let me know. </p><h2>Industry news and updates</h2><p>I&#8217;ve spoken to a lot of readers over the past couple weeks and am grateful for the feedback on this newsletter and the LLM systems we are building. If you and I haven&#8217;t spoken, please reach out!</p><p>For now, I plan to expand the scope of this newsletter to include news and resources generally relevant for the LLM+investors community. From this week:</p><p><strong>OpenAI Deep Research + Open Deep Research.</strong> OpenAI released an <a href="https://openai.com/index/introducing-deep-research/">new tool</a> that&#8217;s helpful for investment research. It&#8217;s exhilarating to input a query, watch it conduct searches, consider the results, think of new queries and so forth. A few days later, <a href="https://x.com/dzhng">David Zhang</a> launched an <a href="https://github.com/dzhng/deep-research">open source</a> version of Deep Research that already has 10K stars on GitHub. Excited to monitor development of these &#8216;research agents&#8217; and their application to investing. Seems a model&#8217;s ability to reason is inextricable with its ability to research. </p><p><strong>Model ML, LLM platform for PE and investment banks, <a href="https://archive.is/4SbYT">announces $12m funding round</a>. </strong>There have been dozens of these tools, but thought the description on their approach was interesting: <em>&#8220;When you open Model ML, it looks a lot like Google Drive. It has its copycat versions of Excel, Powerpoint, Word, etc., which ensures that no information ever needs to leave the workspace.&#8221; </em> Of course replicating Microsoft Office will be no small feat. But this approach seems similar to why OpenAI Deep Research is a better experience than <a href="https://openai.com/index/introducing-operator/">OpenAI Operator</a>. The fact that it&#8217;s fullstack means it can think around bottlenecks and get 10x as much done, instead of requiring the user to babysit.</p><h2>Follow the progress of LLM investment research</h2><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.flatcircle.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.flatcircle.ai/subscribe?"><span>Subscribe now</span></a></p><p><em>If you have feedback or would like to participate in this project, please reply to this email or reach out via <a href="https://x.com/jdmoran">X</a> or <a href="https://www.linkedin.com/in/jamesdmoran/">LinkedIn</a>.</em></p>]]></content:encoded></item><item><title><![CDATA[Flat Circle - Are we merely flipping coins?]]></title><description><![CDATA[Plus: adding new Gemini model, upgrading context template, risk analysis for Grok and Sonnet, and 16 upcoming earnings calls]]></description><link>https://blog.flatcircle.ai/p/flat-circle-are-we-merely-flipping</link><guid isPermaLink="false">https://blog.flatcircle.ai/p/flat-circle-are-we-merely-flipping</guid><dc:creator><![CDATA[Jim Moran]]></dc:creator><pubDate>Tue, 28 Jan 2025 17:00:49 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58359752-1519-4a7c-928a-1a14e15e1e74_1108x811.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Flat Circle measures the ability of language models to predict company earnings results. See our <a href="https://www.flatcircle.ai/p/flat-circle-llm-benchmark-methodology">methodology</a> for detail and disclaimers. If you haven&#8217;t already subscribed, join investors and engineers interested in LLMs+investment research here:</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.flatcircle.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.flatcircle.ai/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h2>Key takeaways</h2><ul><li><p><strong>After 62 live earnings, Grok-2 remains in the lead with a 63% hitrate and a mean return of +171 bps</strong></p><ul><li><p>Claude Sonnet is in second place with a 60% hitrate and +119 bps per earnings</p></li><li><p>In addition to having a high hitrate, Grok and Sonnet do the best job at limiting losses when they are wrong</p></li></ul></li><li><p><strong>Are the models merely flipping coins?</strong></p><ul><li><p>To put Grok&#8217;s 63% hitrate in context, we ran a thousand simulations of 5 models flipping coins 62 times</p></li><li><p>63% is between 1 and 2 standard deviations from the mean. In other words, if the 5 models were guessing randomly, the best performing model would show a hitrate of at least 63% ~10% of the time</p></li></ul></li><li><p><strong>Our new context template, which determines the company information we feed each model, successfully improved results across all models</strong></p><ul><li><p>We show a comparative analysis below and are now using it as our primary context template</p></li><li><p>The new context includes market commentary, institutional holders and fewer past quarters</p></li></ul></li><li><p><strong>Gemini released its new <a href="https://ai.google.dev/gemini-api/docs/thinking">Flash Thinking</a> model, &#8220;capable of stronger reasoning capabilities&#8221;. </strong></p><ul><li><p>We have added this as an additional model and include its calls in the upcoming earnings section below</p></li></ul></li><li><p><strong>We plan to add DeepSeek, but the model has a much lower context limit so we need to rearchitect some things to create a fair comparison</strong></p></li></ul><h2>Recent earnings</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!AfgI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6234733f-20cb-4928-85a8-84b4b92c5eb9_797x943.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!AfgI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6234733f-20cb-4928-85a8-84b4b92c5eb9_797x943.png 424w, https://substackcdn.com/image/fetch/$s_!AfgI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6234733f-20cb-4928-85a8-84b4b92c5eb9_797x943.png 848w, https://substackcdn.com/image/fetch/$s_!AfgI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6234733f-20cb-4928-85a8-84b4b92c5eb9_797x943.png 1272w, https://substackcdn.com/image/fetch/$s_!AfgI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6234733f-20cb-4928-85a8-84b4b92c5eb9_797x943.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!AfgI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6234733f-20cb-4928-85a8-84b4b92c5eb9_797x943.png" width="797" height="943" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6234733f-20cb-4928-85a8-84b4b92c5eb9_797x943.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:943,&quot;width&quot;:797,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:160764,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!AfgI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6234733f-20cb-4928-85a8-84b4b92c5eb9_797x943.png 424w, https://substackcdn.com/image/fetch/$s_!AfgI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6234733f-20cb-4928-85a8-84b4b92c5eb9_797x943.png 848w, https://substackcdn.com/image/fetch/$s_!AfgI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6234733f-20cb-4928-85a8-84b4b92c5eb9_797x943.png 1272w, https://substackcdn.com/image/fetch/$s_!AfgI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6234733f-20cb-4928-85a8-84b4b92c5eb9_797x943.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Are we merely flipping coins?</h2><p>The hitrate of the best performing model (Grok-2) is 63% after 62 live earnings calls. Because we&#8217;re selecting the best result of 5 separate models, random guessing alone would often return results above 50%. <strong>We checked by simulation and found that the best performing coin&#8208;flipper scores 63% or higher in ~10% of scenarios.</strong> In other words, Grok&#8217;s results are between 1 and 2 standard deviations above the mean if all five models were randomly guessing.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jOJx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58359752-1519-4a7c-928a-1a14e15e1e74_1108x811.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jOJx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58359752-1519-4a7c-928a-1a14e15e1e74_1108x811.png 424w, https://substackcdn.com/image/fetch/$s_!jOJx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58359752-1519-4a7c-928a-1a14e15e1e74_1108x811.png 848w, https://substackcdn.com/image/fetch/$s_!jOJx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58359752-1519-4a7c-928a-1a14e15e1e74_1108x811.png 1272w, https://substackcdn.com/image/fetch/$s_!jOJx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58359752-1519-4a7c-928a-1a14e15e1e74_1108x811.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jOJx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58359752-1519-4a7c-928a-1a14e15e1e74_1108x811.png" width="1108" height="811" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/58359752-1519-4a7c-928a-1a14e15e1e74_1108x811.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:811,&quot;width&quot;:1108,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:100612,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jOJx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58359752-1519-4a7c-928a-1a14e15e1e74_1108x811.png 424w, https://substackcdn.com/image/fetch/$s_!jOJx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58359752-1519-4a7c-928a-1a14e15e1e74_1108x811.png 848w, https://substackcdn.com/image/fetch/$s_!jOJx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58359752-1519-4a7c-928a-1a14e15e1e74_1108x811.png 1272w, https://substackcdn.com/image/fetch/$s_!jOJx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58359752-1519-4a7c-928a-1a14e15e1e74_1108x811.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Improved context</h2><p>Our first attempt at giving each model more information when asking it to make a BUY or SELL call) <a href="https://www.flatcircle.ai/p/flat-circle-more-information-lower">resulted in decreased performance</a>. Based on these learnings, we developed another attempt at an improved Context for the models including:</p><ul><li><p>adding market commentary for all historical quarters</p></li><li><p>including fewer historical quarters</p></li><li><p>adding institutional holders</p></li></ul><p>The new context template has demonstrated improved performance.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!aOYz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F128ea9f3-b974-4566-8132-1606b0cf075a_600x435.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!aOYz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F128ea9f3-b974-4566-8132-1606b0cf075a_600x435.png 424w, https://substackcdn.com/image/fetch/$s_!aOYz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F128ea9f3-b974-4566-8132-1606b0cf075a_600x435.png 848w, https://substackcdn.com/image/fetch/$s_!aOYz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F128ea9f3-b974-4566-8132-1606b0cf075a_600x435.png 1272w, https://substackcdn.com/image/fetch/$s_!aOYz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F128ea9f3-b974-4566-8132-1606b0cf075a_600x435.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!aOYz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F128ea9f3-b974-4566-8132-1606b0cf075a_600x435.png" width="600" height="435" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/128ea9f3-b974-4566-8132-1606b0cf075a_600x435.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:435,&quot;width&quot;:600,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:46831,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!aOYz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F128ea9f3-b974-4566-8132-1606b0cf075a_600x435.png 424w, https://substackcdn.com/image/fetch/$s_!aOYz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F128ea9f3-b974-4566-8132-1606b0cf075a_600x435.png 848w, https://substackcdn.com/image/fetch/$s_!aOYz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F128ea9f3-b974-4566-8132-1606b0cf075a_600x435.png 1272w, https://substackcdn.com/image/fetch/$s_!aOYz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F128ea9f3-b974-4566-8132-1606b0cf075a_600x435.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>A few examples of enhanced reasoning:</p><h4><strong>Case Study: AT&amp;T (T)</strong></h4><p>AT&amp;T rose 7% monday, driven by beat on mobile-phone and fiber customers. Most models already pointed to BUY with both contexts, but the new context switched Claude Opus from SELL to BUY, driven the following:</p><blockquote><pre><code>&#8220;At the recent Analyst Day in December 2024, management provided an encouraging long-term outlook, including plans to expand its fiber network to over 50 million locations. They guided for sustained growth in key metrics like service revenue, EBITDA, free cash flow and EPS.&#8221;</code></pre></blockquote><p>The company&#8217;s analyst day was not previously fed into the models&#8217; context and appears to be an incremental factor in Claude Opus&#8217; decision to BUY. We are working on more robust ways to add analyst day and conference presentations. </p><h4><strong>Case Study: HCA Healthcare (HCA)</strong></h4><p>HCA missed earnings Friday, declining 4%. While operating metrics improved, the loss was primarily due to greater than expected hurricane impacts. </p><p>Both o1 and Sonnet's models recommended BUY with the original context template and SELL with the new template.</p><p><strong>Old Context format</strong></p><p>o1 - BUY  </p><blockquote><pre><code>"The share price has underperformed due to short-term hurricane impacts, but HCA reaffirmed guidance and expects strong demand and stable margins into next year, positioning the stock for upside.""</code></pre></blockquote><p>Sonnet - BUY</p><blockquote><pre><code>"While the share price has declined 17% since Q3 earnings due to hurricane impacts, the market appears to be overly focused on temporary disruptions rather than the strong underlying business performance and recovery trajectory. With operational metrics improving and hurricane impacts starting to anniversary, Q4 results are likely to demonstrate better-than-expected recovery and strong core performance, driving shares higher."</code></pre></blockquote><p><strong>New Context format</strong></p><p>o1 - SELL  </p><blockquote><pre><code>"With Q4 expected at the lower end of guidance amid continued hurricane-related headwinds, analysts lowering targets, and growing policy uncertainties (ACA subsidies), near-term risks appear elevated, making a weaker post-earnings share reaction more likely than upside."</code></pre></blockquote><p>Sonnet - SELL</p><blockquote><pre><code>"The combination of known hurricane headwinds, analyst concerns about policy/subsidy risks, and decelerating growth metrics suggests Q4 results and/or 2025 guidance could disappoint current expectations, likely leading to further share price weakness."</code></pre></blockquote><h4>Case Study: Union Pacific (UNP)</h4><p><strong>Old Context format - o1 - SELL</strong></p><blockquote><pre><code>&#8220;Despite Union Pacific&#8217;s solid operations and year-over-year results, recent quarters&#8217; earnings releases have disappointed investors and triggered stock price declines. Management has guided fourth-quarter results to be roughly in line with the third quarter, which appears below market expectations for stronger growth. As a result, the stock is at risk of another post-earnings pullback.&#8221;</code></pre></blockquote><p><strong>New Context format - o1 - BUY</strong></p><blockquote><pre><code>&#8220;Union Pacific has consistently improved its service reliability and operating efficiency, which is enabling volume gains (particularly in Intermodal) and driving solid pricing despite a soft freight environment. Management&#8217;s focus on cost control and a &#8216;buffer&#8217; approach to resources has delivered steady margin expansion and double-digit EPS growth even with negative mix from elevated international Intermodal volumes and ongoing coal weakness.&#8221;</code></pre></blockquote><p>The new context written pre-earnings was very similar to a post-earnings summary written by the sellside (BMO Capital Markets)&#8230;</p><blockquote><p>&#8220;UNP delivered industry leading improvement in opex (ex. fuel) &#8230;Efficiency gains can be seen across the network including better labor productivity, train length, and locomotive productivity. Service enhancement should prove accretive to volume in the coming years, and we expect cyclical tailwinds to drive strong EPS growth. F2025 EPS outlook for HSD to LDD growth despite a modest volume picture underscores the operating leverage opportunity of a more efficient network. Management sees further runway to optimize performance. Management is guiding to a mixed volume growth outlook amid coal headwinds, challenging international intermodal comps, and a varied economic backdrop.&#8221;</p></blockquote><h2><strong>Picking up pennies in front of a steamroller?</strong></h2><p>Some readers expressed concern that LLMs may tend to make the &#8216;consensus&#8217; call, causing them to have a higher than average hitrate but experience huge losses when they are wrong. </p><p>The <a href="https://www.flatcircle.ai/p/flat-circle-llm-benchmark-methodology">current prompt template</a> doesn&#8217;t instruct the models to factor in asymmetric upside or downside in their calls. Still, the models appear to differ in how they &#8220;risk-adjust:&#8221; </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wmIE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84a0410e-1865-42f1-af7e-5970b3e16562_659x462.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wmIE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84a0410e-1865-42f1-af7e-5970b3e16562_659x462.png 424w, https://substackcdn.com/image/fetch/$s_!wmIE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84a0410e-1865-42f1-af7e-5970b3e16562_659x462.png 848w, https://substackcdn.com/image/fetch/$s_!wmIE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84a0410e-1865-42f1-af7e-5970b3e16562_659x462.png 1272w, https://substackcdn.com/image/fetch/$s_!wmIE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84a0410e-1865-42f1-af7e-5970b3e16562_659x462.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wmIE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84a0410e-1865-42f1-af7e-5970b3e16562_659x462.png" width="659" height="462" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/84a0410e-1865-42f1-af7e-5970b3e16562_659x462.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:462,&quot;width&quot;:659,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:55879,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wmIE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84a0410e-1865-42f1-af7e-5970b3e16562_659x462.png 424w, https://substackcdn.com/image/fetch/$s_!wmIE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84a0410e-1865-42f1-af7e-5970b3e16562_659x462.png 848w, https://substackcdn.com/image/fetch/$s_!wmIE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84a0410e-1865-42f1-af7e-5970b3e16562_659x462.png 1272w, https://substackcdn.com/image/fetch/$s_!wmIE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84a0410e-1865-42f1-af7e-5970b3e16562_659x462.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Note: we added Claude Sonnet on January 13, so it missed the painful Walgreens earnings and thrilling Radius Recycling earnings impacting the other models.</p><p>We will watch how the gains and losses compare as the earnings roll through and experiment with incorporating risk more explicitly in our prompt. </p><h2>Upcoming earnings</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!csK4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F607934e2-b56f-4dcf-aaad-0bdb8c0f9394_789x651.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!csK4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F607934e2-b56f-4dcf-aaad-0bdb8c0f9394_789x651.png 424w, https://substackcdn.com/image/fetch/$s_!csK4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F607934e2-b56f-4dcf-aaad-0bdb8c0f9394_789x651.png 848w, https://substackcdn.com/image/fetch/$s_!csK4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F607934e2-b56f-4dcf-aaad-0bdb8c0f9394_789x651.png 1272w, https://substackcdn.com/image/fetch/$s_!csK4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F607934e2-b56f-4dcf-aaad-0bdb8c0f9394_789x651.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!csK4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F607934e2-b56f-4dcf-aaad-0bdb8c0f9394_789x651.png" width="789" height="651" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/607934e2-b56f-4dcf-aaad-0bdb8c0f9394_789x651.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:651,&quot;width&quot;:789,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:103820,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!csK4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F607934e2-b56f-4dcf-aaad-0bdb8c0f9394_789x651.png 424w, https://substackcdn.com/image/fetch/$s_!csK4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F607934e2-b56f-4dcf-aaad-0bdb8c0f9394_789x651.png 848w, https://substackcdn.com/image/fetch/$s_!csK4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F607934e2-b56f-4dcf-aaad-0bdb8c0f9394_789x651.png 1272w, https://substackcdn.com/image/fetch/$s_!csK4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F607934e2-b56f-4dcf-aaad-0bdb8c0f9394_789x651.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The reasoning behind these calls are available <a href="https://drive.google.com/file/d/1qyFuC1TWeMUzugkjOc3VB9toMBoQMAM3/view?usp=sharing">here</a>.</p><h2>Follow the progress of LLM investment research</h2><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.flatcircle.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.flatcircle.ai/subscribe?"><span>Subscribe now</span></a></p><p><em>If you have feedback or would like to participate in this project, please reply to this email or reach out via <a href="https://x.com/jdmoran">X</a> or <a href="https://www.linkedin.com/in/jamesdmoran/">LinkedIn</a>.</em></p>]]></content:encoded></item><item><title><![CDATA[Flat Circle - More information, lower accuracy? ]]></title><description><![CDATA[Plus: Two research papers, 7 upcoming earnings, another system upgrade]]></description><link>https://blog.flatcircle.ai/p/flat-circle-more-information-lower</link><guid isPermaLink="false">https://blog.flatcircle.ai/p/flat-circle-more-information-lower</guid><dc:creator><![CDATA[Jim Moran]]></dc:creator><pubDate>Wed, 22 Jan 2025 03:40:26 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!7X8t!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25a5fe13-71e7-4abe-b48f-f1657ffb1d7a_787x664.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Flat Circle measures the ability of language models to predict company earnings results. See our <a href="https://www.flatcircle.ai/p/flat-circle-llm-benchmark-methodology">methodology</a> for detail and disclaimers. If you haven&#8217;t already subscribed, join investors and engineers interested in LLMs+investment research here:</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.flatcircle.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.flatcircle.ai/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h2>Key takeaways</h2><ul><li><p><strong>After 37 earnings results, Grok-2 still the most accurate model with a 65% hitrate and a mean return of +230 bps per earnings</strong> </p><ul><li><p>Anthropic&#8217;s models are at ~60% hitrate and 110 - 150 bps mean return per earnings</p></li><li><p>Gemini and OpenAI are both slightly below 50% with slightly negative mean return per earnings</p></li><li><p>This represents a slight decline in performance for Grok and Anthropic and a slight improvement for Gemini and OpenAI vs the <a href="https://www.flatcircle.ai/p/grok-and-anthropic-calling-more-than">previous update</a></p></li></ul></li><li><p><strong>Our first attempt at feeding the models additional information led to decreased accuracy</strong></p><ul><li><p>Last week, we built a new version of the template for what company information we provide each model and have been running the two in parallel to see which is more accurate. </p></li><li><p>The new version added market commentary and additional historical quarters</p></li><li><p><strong>Surprisingly, this &#8220;improved&#8221; context produced worse performance. </strong>We discuss theories below</p></li></ul></li><li><p><strong>Based on these learnings, we have developed &#8220;v3&#8221; of the context template and are running a new test to see if accuracy improves</strong></p><ul><li><p>The v3 template includes current and past institutional holders, which may have impact on post earnings price action</p></li><li><p>We are testing this new version in parallel and will roll it out once we see consistent performance improvement across most models</p></li></ul></li></ul><h2>Recent earnings</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7X8t!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25a5fe13-71e7-4abe-b48f-f1657ffb1d7a_787x664.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7X8t!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25a5fe13-71e7-4abe-b48f-f1657ffb1d7a_787x664.png 424w, https://substackcdn.com/image/fetch/$s_!7X8t!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25a5fe13-71e7-4abe-b48f-f1657ffb1d7a_787x664.png 848w, https://substackcdn.com/image/fetch/$s_!7X8t!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25a5fe13-71e7-4abe-b48f-f1657ffb1d7a_787x664.png 1272w, https://substackcdn.com/image/fetch/$s_!7X8t!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25a5fe13-71e7-4abe-b48f-f1657ffb1d7a_787x664.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7X8t!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25a5fe13-71e7-4abe-b48f-f1657ffb1d7a_787x664.png" width="787" height="664" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/25a5fe13-71e7-4abe-b48f-f1657ffb1d7a_787x664.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:664,&quot;width&quot;:787,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:105427,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7X8t!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25a5fe13-71e7-4abe-b48f-f1657ffb1d7a_787x664.png 424w, https://substackcdn.com/image/fetch/$s_!7X8t!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25a5fe13-71e7-4abe-b48f-f1657ffb1d7a_787x664.png 848w, https://substackcdn.com/image/fetch/$s_!7X8t!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25a5fe13-71e7-4abe-b48f-f1657ffb1d7a_787x664.png 1272w, https://substackcdn.com/image/fetch/$s_!7X8t!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25a5fe13-71e7-4abe-b48f-f1657ffb1d7a_787x664.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>More information &#8212;&gt; less accuracy?</h2><p>Our system feeds each model the same information (&#8220;Context&#8221;) when asking it to make a BUY or SELL call about a given company&#8217;s earnings. Our original Context template includes the following information for the past few quarters:</p><ul><li><p>Share price performance vs the S&amp;P</p></li><li><p>Press releases</p></li><li><p>Company and peer earnings transcripts</p></li><li><p>Sellside upgrades and downgrades</p></li><li><p>Share price performance during the quarter</p></li><li><p>Share price reactions from past earnings</p></li></ul><p><a href="https://www.flatcircle.ai/p/grok-and-anthropic-calling-more-than">Last week</a>, we began testing an improved Context template that included the above plus:</p><ul><li><p>Many more historical quarters of this information</p></li><li><p>Market commentary explaining key price moves in the most recent quarter</p></li></ul><p><strong>These changes led to decreased performance overall</strong>:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!M7tw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F119a1ee1-e5f9-4371-bec2-e6c3976efe85_599x421.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!M7tw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F119a1ee1-e5f9-4371-bec2-e6c3976efe85_599x421.png 424w, https://substackcdn.com/image/fetch/$s_!M7tw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F119a1ee1-e5f9-4371-bec2-e6c3976efe85_599x421.png 848w, https://substackcdn.com/image/fetch/$s_!M7tw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F119a1ee1-e5f9-4371-bec2-e6c3976efe85_599x421.png 1272w, https://substackcdn.com/image/fetch/$s_!M7tw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F119a1ee1-e5f9-4371-bec2-e6c3976efe85_599x421.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!M7tw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F119a1ee1-e5f9-4371-bec2-e6c3976efe85_599x421.png" width="599" height="421" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/119a1ee1-e5f9-4371-bec2-e6c3976efe85_599x421.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:421,&quot;width&quot;:599,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:45717,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!M7tw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F119a1ee1-e5f9-4371-bec2-e6c3976efe85_599x421.png 424w, https://substackcdn.com/image/fetch/$s_!M7tw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F119a1ee1-e5f9-4371-bec2-e6c3976efe85_599x421.png 848w, https://substackcdn.com/image/fetch/$s_!M7tw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F119a1ee1-e5f9-4371-bec2-e6c3976efe85_599x421.png 1272w, https://substackcdn.com/image/fetch/$s_!M7tw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F119a1ee1-e5f9-4371-bec2-e6c3976efe85_599x421.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Two theories:</p><ul><li><p>Some research shows <a href="https://arxiv.org/pdf/2402.14848">decreased reasoning performance with longer context lengths</a>. Adding additional historical quarters may have been counterproductive</p></li><li><p>Some research shows <a href="https://arxiv.org/pdf/2410.12464">weaker models outperforming stronger models in trading decisions during bull markets</a>, as weaker models tend to overweight subjective commentary: </p><blockquote><p><em>&#8220;&#8230;stronger LLM tends to focus more on the facts while the weaker LLM give more weight to subjective news. However &#8230; the increased reasoning ability does not bring a higher return in the cryptocurrency trading. This outcome aligns with economic theory, which suggests that typical market participants are only partially rational, with investors driven by emotional and psychological factors that push asset prices far beyond stock&#8217;s intrinsic value&#8230;&#8221;</em></p></blockquote><p>This is one possible explanation for the improvement in the lower performance models (i.e., Gemini and Claude Opus), but not the stronger models.  </p></li></ul><h2>Next attempt: adding institutional holders, more market commentary, fewer quarters</h2><p>Based on these learnings, we are testing a new context template in parallel:</p><ul><li><p>Pulling in current and past institutional holders, often a factor in price action following earnings</p></li><li><p>We have again pulled in market commentary, though this time for all past quarters</p></li><li><p>Reducing the total number of historical quarters of information provided</p></li></ul><p>We will compare the results in a future update.</p><h2>Upcoming earnings</h2><p>Note the following earnings estimates are derived from the original Context template, discussed above.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!F05O!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67fe9749-2f7d-4a58-bad1-33765b813ecf_688x417.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!F05O!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67fe9749-2f7d-4a58-bad1-33765b813ecf_688x417.png 424w, https://substackcdn.com/image/fetch/$s_!F05O!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67fe9749-2f7d-4a58-bad1-33765b813ecf_688x417.png 848w, https://substackcdn.com/image/fetch/$s_!F05O!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67fe9749-2f7d-4a58-bad1-33765b813ecf_688x417.png 1272w, https://substackcdn.com/image/fetch/$s_!F05O!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67fe9749-2f7d-4a58-bad1-33765b813ecf_688x417.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!F05O!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67fe9749-2f7d-4a58-bad1-33765b813ecf_688x417.png" width="688" height="417" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/67fe9749-2f7d-4a58-bad1-33765b813ecf_688x417.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:417,&quot;width&quot;:688,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:58526,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!F05O!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67fe9749-2f7d-4a58-bad1-33765b813ecf_688x417.png 424w, https://substackcdn.com/image/fetch/$s_!F05O!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67fe9749-2f7d-4a58-bad1-33765b813ecf_688x417.png 848w, https://substackcdn.com/image/fetch/$s_!F05O!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67fe9749-2f7d-4a58-bad1-33765b813ecf_688x417.png 1272w, https://substackcdn.com/image/fetch/$s_!F05O!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67fe9749-2f7d-4a58-bad1-33765b813ecf_688x417.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The reasoning behind these earnings calls are available <a href="https://drive.google.com/file/d/1-08LBWMPKcO5gH9hNReJ5Mc3DWZv592z/view?usp=sharing">here</a>.</p><p><em>If you have feedback or would like to participate in this project, please reply to this email or reach out via <a href="https://x.com/jdmoran">X</a> or <a href="https://www.linkedin.com/in/jamesdmoran/">LinkedIn</a>.</em></p><h2>Follow the progress of LLM investment research</h2><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.flatcircle.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.flatcircle.ai/subscribe?"><span>Subscribe now</span></a></p><p></p><p></p><p></p>]]></content:encoded></item><item><title><![CDATA[Grok and Anthropic calling more than two thirds of earnings correctly]]></title><description><![CDATA[Plus: 11 upcoming earnings and several upgrades to our system]]></description><link>https://blog.flatcircle.ai/p/grok-and-anthropic-calling-more-than</link><guid isPermaLink="false">https://blog.flatcircle.ai/p/grok-and-anthropic-calling-more-than</guid><dc:creator><![CDATA[Jim Moran]]></dc:creator><pubDate>Thu, 16 Jan 2025 00:44:49 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!6tZZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdd61cdf-e338-4853-ac24-4e1d4de29eba_692x520.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Flat Circle measures the ability of language models to predict company earnings results. See our <a href="https://www.flatcircle.ai/p/flat-circle-llm-benchmark-methodology">methodology</a> for detail and disclaimers. If you haven&#8217;t already subscribed, join investors and engineers interested in LLMs+investment research here:</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.flatcircle.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.flatcircle.ai/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h2>Key takeaways</h2><ul><li><p><strong>After 22 earnings calls, Grok and Anthropic lead earnings accuracy, calling</strong> <strong>more than two thirds of earnings correctly</strong> <strong>with a mean return of 230 to 300 bps per earnings</strong></p><ul><li><p>The worst performing model is OpenAI&#8217;s o1, with a hitrate of 41% and a total return of -38%</p></li></ul></li><li><p>We made significant improvements to Context provided to each model:</p><ul><li><p>Now includes market commentary explaining key price moves during the quarter</p></li><li><p>We are running the two Contexts in parallel and will roll out once we&#8217;ve confirmed overall earnings accuracy is improved</p></li></ul></li><li><p>OpenAI o1 exited preview mode, so we&#8217;ve switched over to the current <a href="https://platform.openai.com/docs/guides/reasoning">o1-2024-12-17 model</a>. This allows us to specify a reasoning_effort parameter, which we&#8217;ve set to &#8220;high"</p></li></ul><h2>Upcoming earnings</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6tZZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdd61cdf-e338-4853-ac24-4e1d4de29eba_692x520.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6tZZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdd61cdf-e338-4853-ac24-4e1d4de29eba_692x520.png 424w, https://substackcdn.com/image/fetch/$s_!6tZZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdd61cdf-e338-4853-ac24-4e1d4de29eba_692x520.png 848w, https://substackcdn.com/image/fetch/$s_!6tZZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdd61cdf-e338-4853-ac24-4e1d4de29eba_692x520.png 1272w, https://substackcdn.com/image/fetch/$s_!6tZZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdd61cdf-e338-4853-ac24-4e1d4de29eba_692x520.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6tZZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdd61cdf-e338-4853-ac24-4e1d4de29eba_692x520.png" width="692" height="520" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cdd61cdf-e338-4853-ac24-4e1d4de29eba_692x520.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:520,&quot;width&quot;:692,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:74447,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6tZZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdd61cdf-e338-4853-ac24-4e1d4de29eba_692x520.png 424w, https://substackcdn.com/image/fetch/$s_!6tZZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdd61cdf-e338-4853-ac24-4e1d4de29eba_692x520.png 848w, https://substackcdn.com/image/fetch/$s_!6tZZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdd61cdf-e338-4853-ac24-4e1d4de29eba_692x520.png 1272w, https://substackcdn.com/image/fetch/$s_!6tZZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdd61cdf-e338-4853-ac24-4e1d4de29eba_692x520.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Recent earnings</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1LCw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6532eea-c67b-4def-96ef-922e92edac33_799x833.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1LCw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6532eea-c67b-4def-96ef-922e92edac33_799x833.png 424w, https://substackcdn.com/image/fetch/$s_!1LCw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6532eea-c67b-4def-96ef-922e92edac33_799x833.png 848w, https://substackcdn.com/image/fetch/$s_!1LCw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6532eea-c67b-4def-96ef-922e92edac33_799x833.png 1272w, https://substackcdn.com/image/fetch/$s_!1LCw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6532eea-c67b-4def-96ef-922e92edac33_799x833.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1LCw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6532eea-c67b-4def-96ef-922e92edac33_799x833.png" width="799" height="833" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b6532eea-c67b-4def-96ef-922e92edac33_799x833.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:833,&quot;width&quot;:799,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:139638,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1LCw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6532eea-c67b-4def-96ef-922e92edac33_799x833.png 424w, https://substackcdn.com/image/fetch/$s_!1LCw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6532eea-c67b-4def-96ef-922e92edac33_799x833.png 848w, https://substackcdn.com/image/fetch/$s_!1LCw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6532eea-c67b-4def-96ef-922e92edac33_799x833.png 1272w, https://substackcdn.com/image/fetch/$s_!1LCw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6532eea-c67b-4def-96ef-922e92edac33_799x833.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Applied Digital (APLD)</strong></p><p>Claude&#8217;s new Sonnet model correctly called SELL while its older Opus model called BUY. We discuss the differences in the reasoning in the <a href="https://www.flatcircle.ai/p/flat-circle-learnings-from-new-model">prior post</a>, where Sonnet correctly showed skepticism on the timeline of the Ellendale announcement. However, between when the SELL call was issued and the earnings release, the company announced an <a href="https://www.reuters.com/business/finance/macquarie-invest-up-5-bln-applied-digital-data-centers-wsj-reports-2025-01-14/">investment by Macquarie</a> causing the stock to rise the day prior to earnings. The <a href="https://www.flatcircle.ai/p/flat-circle-llm-benchmark-methodology">methodology</a> uses only the trading day following earnings, that interim gain is not reflected here.</p><p><strong>Citi (C)</strong> </p><p>The company posted stronger than expected markets and banking revenue, as well as a larger than expected buyback. Gemini and Grok each made solid points in their pre-earnings calls:</p><p><strong>Gemini 2.0 Flash Experimental</strong></p><blockquote><pre><code>The bank has also been able to close another consent order this quarter. Given this momentum, it is likely that the bank will meet or exceed expectations. The prior quarter stock decline was not driven by the bank's performance, rather, by an external regulatory matter, and this has been addressed.</code></pre></blockquote><p><strong>Grok</strong></p><blockquote><pre><code>Our analysis of Citigroup Inc.&#8217;s performance over the past four quarters supports a BUY recommendation&#8230; with consistent revenue growth across its core businesses, notably strong performance in services, banking, and wealth</code></pre></blockquote><h2>Improvements to Context template</h2><p>The system provides the same Context to each model for each call. This week, we added several improvements to the Context including:</p><ul><li><p>Market commentary explaining key price during the quarter</p></li></ul><p>We are testing the new Context in parallel and will roll it out once we&#8217;ve confirmed it improves overall accuracy across all the models.</p><h4>Contrasting calls made with old vs new Context</h4><p><strong>JB Hunt (JBHT) - Reports tomorrow after market close</strong></p><p><em>Claude 3.5 Sonnet - Old Context - SELL</em></p><blockquote><pre><code>While there are some positives like improving safety metrics and bid compliance rates, the pricing and volume headwinds appear likely to drive another quarter of year-over-year profit declines. </code></pre></blockquote><p><em>Claud 3.5 Sonnet - New Context - BUY</em></p><blockquote><pre><code>While pricing pressure persists, particularly in Intermodal contracts through H1 2025, the stock appears to be looking past this near-term headwind&#8230;The main risk is continued pricing pressure, but this appears largely priced in at current levels given management's consistent messaging on this headwind persisting through H1 2025.</code></pre></blockquote><p><strong>Taiwan Semiconductor (TSM) - Reports tomorrow before market open</strong></p><p><em>OpenAI o1 - Old Context - BUY</em></p><blockquote><pre><code>TSM's share price has underperformed the S&amp;P by -1.84% during the quarter, indicating low market expectations. Historically, when TSM underperformed or only slightly outperformed the S&amp;P during the quarter, the stock experienced a positive price reaction after earnings (e.g., +9.79% and +9.80% in previous similar quarters). Given this pattern, there is a strong likelihood that TSM's stock price will rise following the upcoming earnings, making it a good opportunity to BUY.</code></pre></blockquote><p><em>OpenAI o1 - New Context - BUY</em></p><blockquote><pre><code>TSMC&#8217;s AI-related revenue has surged to represent a mid-teens percentage of total sales this year, with management expecting it to triple and reach an even higher share next year. Strong demand for 3nm and 5nm smartphone and HPC/AI applications is driving utilization upward, lifting gross margins above 57%. Moreover, TSMC continues to pass along higher costs to customers while maintaining industry-leading technology and capacity at scale, reinforcing its pricing power. These factors, plus the company&#8217;s increased full-year revenue outlook (near +30% in USD terms), suggest TSMC&#8217;s shares are poised to close higher following earnings.  </code></pre></blockquote><p><em>If you have feedback or would like to participate in this project, please reply to this email or reach out via <a href="https://x.com/jdmoran">X</a> or <a href="https://www.linkedin.com/in/jamesdmoran/">LinkedIn</a>.</em></p><h2>Follow the progress of LLM investment research</h2><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.flatcircle.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.flatcircle.ai/subscribe?"><span>Subscribe now</span></a></p><p></p>]]></content:encoded></item></channel></rss>