Using AI to Extract B2B Leads from Unstructured Data
With AI, everything can be turned into a data pipeline
AI agents for go-to-market are hyped, but most teams barely get past lead enrichment and writing email copy.
The next wave in AI for GTM is to run background agents that scan unstructured data sources in real time, and convert them into GTM signals your team can actually act on.
So in today’s post, I’ll show you a system that processes raw SEC filings (via the EDGAR) in real time and extracts qualified leads, which you can insert automatically into a CRM, or alert reps the moment a new opportunity appears - all for nearly free, without relying on vendor data.
For example, say you are a cap table SaaS company like Carta, or Gusto. That means you need to reach executives when are just raising money, since that’s when they typically buy HR or IR software. You can do this, if you monitor SEC filings for new private placements, which leak weeks before they appear on Techcrunch, etc.
So why is automation needed in this case? It’s simply too much hassle to manually monitor 1000’s of SEC filings, and people are forgetful and lazy. Plus, most unstructured data is sparse in information content, which is why we need to borrow LLMs’ help to extract signals.
Of course, not all SEC filings (there’s over 300+ types) contain GTM signals nor contain personal contact data. So I’ll pick some examples that make sense to monitor systematically, and go deeper into one of them.
What’s to come:
How to operationalize this automation into broader GTM motions, using human in the loop.
The exact template, and an explanation of key technical details and gotchas.
An overview of GTM signals that can be mined from SEC filings.
How to safely pull real-time (and historical) data from the SEC EDGAR database, no paid APIs, plus rate limits, etc, so you don’t get banned.
Why Unstructured Data?
Before we get into the mechanics, it’s worth asking: why should GTM teams even bother with unstructured or alternative data?
Two reasons:
The ROI of traditional outreach is collapsing, especially because companies are spraying and praying. (Note, for some industries, outreach still works very well).
Lead databases alone don’t give you alpha. They just let you build cookie cutter campaigns, which lack timeliness and relevancy.
The crux is that playbooks and strategies are getting crowded, and frankly, expensive. Sending Apollo sequences, overpaying for Clay, scraping LinkedIn, etc, all of that is quickly commoditizing for many SaaS verticals.
Once everyone is doing the same thing with the same tools, you shouldn’t expect outsized returns.
This begs your team to do something different, and one obvious way is to build signals using differentiated data, which is necessary for building differentiated campaigns. And whoever acts on the best signals the fastest wins.
This implies that the next wave of GTM is all about alternative / unstructured data sources - satellite images, government filings, in-depth org charts, court records, freight manifests, etc. They’re often messy and require non-trivial work, which is why they are valuable (if relevant to your business).
Which leads to SEC filings. They’re legally required, plentiful, and noisy, but inside they are full of valuable details depending on what you are selling.
Need to modernize your GTM and RevOps asap? We offer both white glove services and group training. DM me if you have any questions.