Right now, the system returns podcast excerpts only when it feels it's found something incremental to investors and is therefore likely missing some good stuff it didn't transcribe correctly.
You make a good point that evals are hard for long tail, deep research tasks where the LLM is looking for a needle in a haystack. You can't just give a random 1% of documents to a human expert when you expect something material in .01 - .0001% of documents and don't know the distribution. Seems like LLMs won't transform scuttlebutt workflows until we figure this out.
Right now, the system returns podcast excerpts only when it feels it's found something incremental to investors and is therefore likely missing some good stuff it didn't transcribe correctly.
You make a good point that evals are hard for long tail, deep research tasks where the LLM is looking for a needle in a haystack. You can't just give a random 1% of documents to a human expert when you expect something material in .01 - .0001% of documents and don't know the distribution. Seems like LLMs won't transform scuttlebutt workflows until we figure this out.