Vertical AI Startups: Solving Industry-specific Problems by Combining AI and Subject Matter Expertise

Low level task-based AI gets commoditized quickly and more general AI is decades off. In the meanwhile, will new AI startups succeed or will the value accrue to Google, Facebook, and Amazon?

While most of the machine learning talent works in big tech companies, massive and timely problems are lurking in every major industry outside tech.

What is Vertical AI?

In a recent talk at AI by the bay, I laid out a four-factor definition of what I consider to be a vertical AI startup.

1. Full stack products

Provide a full-stack fully-integrated solution to the end customer problem from the interface that solves for the need all the way down the stack to the functionality, models, and data that power the interface. This ecosystem is much more defensible over time than just proprietary data or models. Designing the right product interface requires subject matter expertise, and owning the interface allows you to instrument it and gather proprietary data. Then you’re able to build models that drive high-value functionality in a virtuous cycle between the interface and the data. You control the ‘data value chain’ and have pricing power and defensibility over time.

Example: Blue River builds agriculture equipment that reduces chemicals and saves costs. They ‘personalize’ treatment of each individual plant, applying herbicides only to the weeds and not to the crop or soil. They use computer vision to identify each individual plant, machine learning to decide how to treat each plant, and robotics to take precise corresponding action for each plant. Blue River is defensible because it’s incredibly hard to replicate such a complex full-stack product, from gathering the training data for the various models, to incorporating the models alongside robotics into the machines, to integrating these machines into existing farm equipment and distribution channels.

2. Subject matter expertise

Product and sales at vertical AI startups benefit from bringing in key leaders from the industry early on in the business. Building full-stack products requires deep subject matter expertise. Selling these products requires trust, respect, and relationships within the industry.  Teams that manage to combine the subject matter and technical expertise are able to model the domain richly and drive innovation that comes from thinking outside the box by understanding what the box is. Teams that come with a domain-first approach tend to get stuck inside the box, and teams that come with a tech-first tend to get stuck out in left field. There is also a major issue with team evolution -- if you’re unable to set the joint domain-tech DNA early, then one side dominates, and it becomes a real challenge to bring in world class folks from the other side, as they will never have the same level of authority and respect within the company.

Example: the Zymergen leadership team is a great mix of strong capabilities targeted at industrial biology; commercial (CEO Joshua Hoffman), scientific (CSO Zach Serber), and data (CTO Aaron Kimball). The harder it is to assemble the mixed team and set the company joint-DNA early on, the more defensible the business.

3. Proprietary data

The technology market is hyper competitive. As soon as you demonstrate good results, many people will copy you almost instantly if they can. Defensible AI businesses are built on proprietary data that is difficult to replicate. This happens in two phases, bootstrapping and compounding. In the bootstrap stage, you are building a unique set of training data by aggregating publicly available data and enriching it in some challenging way, running simulations to generate synthetic data, or doing BD deals to gather a set of internal company data. Once you have bootstrapped, you are building a ‘data flywheel’ into your products, so that you are capturing totally unique data over time from how your product is used, and that data capture is designed precisely to serve the needs of your models, which are designed to serve the needs of the product functionality, which is designed to meet the needs of the customer.  This data value chain ensures that the customer’s motivation is aligned with your motivation to compound the value of your proprietary dataset.

Example: Merlon Intelligence gathers training data from compliance analyst interactions with a financial crimes investigation dashboard. Gathering the data requires a full stack product where the interface is designed and instrumented to gather data that feeds into the models. It’s a learning to rank setup -- learning to rank for risk just like the Facebook newsfeed learns to rank for engagement. Banks have a great deal of operational risk in deploying new financial crimes compliance software, so it’s a challenge to penetrate the market.  The harder it is to gather your data, and the more its intertwined with the product and go to market strategy, the more defensible the business.

4. AI delivers core value

Amazon, Netflix, and Facebook are all companies that use AI to drive very high percentage lift in revenue and engagement. That’s valid and awesome, but AI is not the core value of their products -- Amazon is an ecommerce store, Netflix is a video entertainment company, and Facebook is a social media company.  Back when we first started Data Collective, we called this scenario the ‘data side car’ -- like those really cool old motorcycles with an attached sidecar.  AI is not the core value, but an attachment that optimizes the core value. By contrast,  Vertical AI solutions are about AI unlocking entirely new opportunities rather than just optimizing existing opportunities.

Example: Opendoor’s entire business model for making a more liquid market in real estate is predicted upon the notion that they can use models to price a home so accurately that they can make an offer immediately. The more AI delivers the product's core value by unlocking a totally new opportunity through rich domain modeling within the vertical and models built on top of proprietary data gathered via the product itself, the more defensible the business.

Why Go vertical?

1. Don’t get ripped off

Solve the business problem directly for the end customer and put yourself in a position of leverage to capture value from the full-stack solution. Avoid being disintermediated from the end customer and getting into a position of weakness. You will wind up solving the the hardest technology problems down the stack, but subject to the strength of the solution designers up the stack, who will constantly negotiate you down and erode your slide of the pie.

2. Tasks get commoditized

You might think you have a special market position due to a novel new deep net architecture, or that you have invested massive amounts of time in building an named entity or image tagger. The reality is that these low level tasks are commoditized very quickly. Today’s novelty is tomorrow’s open source, and that’s happening faster and faster each year. Look at low level tasks as building blocks that you compose into higher level solutions rather than as the critical IP of your business. 

3. Software is eating the world

Every company in every industry needs to be a tech company, but most industries are struggling to deploy tech effectively, let alone AI.  Carefully analyze the markets you are considering, and determine whether the incumbents have a protected market position (e.g. through regulation) and you should sell them picks and shovels, or where the incumbents are lacking strong barriers to entry, in which case you may want to go for a disruptive challenger model.

4. Enterprise exits come in cohorts

Over 90% of AI startups are enterprise. Enterprise exits come in cohorts, and many are cohorts within specific industry verticals like financial services or healthcare. Rather than being a singular outlier, you want to be part of a wave of investment focused on a particular cohort of startups going after a niche. Focus your energy on analyzing verticals where both the customer segments and the venture capital community are keen to see solutions, and it will make it much easier for you to sell your products to customers and your company to investors.

Understanding Enterprise Cohorts

The folks at Sapphire Ventures had a couple of goods posts on why enterprise funds may return more capital than consumer funds, and how enterprise exits come in cohorts, whereas consumer exits are dominated by outliers like Facebook, Snapchat and WhatsApp.

Screen Shot 2017-03-07 at 9.59.32 PM.png

Compared with consumer startups since 1995, enterprise startups have returned 40% more capital overall. Enterprise and consumer startups have generated equivalent IPO value, but enterprise has generated 2.5X the M&A value.

There are three major advantages to focusing on enterprise:

1. You are aiming at a 40% larger pool of value creation at the time of exit; $825B total exits for enterprise versus $582B total exits for consumer.

2. A broader distribution of value means that you’re probably more likely to create a $B+ company in enterprise than in consumer. The top five enterprise companies account for 11% of total value creation, whereas the top five consumer companies account for over 3X that amount, or 36% of total value creation.

3. The greater value created by M&A means that you probably have greater optionality for large M&A exits ahead of an IPO. Enterprise M&A accounted for $410B of exits, which is 2.5X the $168B of consumer M&A exits.

According to CBInsights report on AI startups that have raised more than 30M, there are nearly 10X the number of enterprise startups as compared with consumer startups.

Selecting Vertical AI Cohorts


First, look for big addressable markets (TAM) with healthy margins. Be scientific when evaluating TAM. Don’t fall into the trap of confirmation bias, and seek out information that validates your opinions. Rather, thinking like a scientist and objectively seek out all available data; especially data that challenges your views. Avoid the 1% fallacy; also called the large market fallacy. we’ve all heard the one where ‘all we need to do is get 1% of market X, and we’re golden.’ A proper evaluation of TAM takes significant time and research, but it’s way cheaper and easier than wasting two years of your life chasing a market that is orders of magnitude smaller than you thought, or even worse, nonexistent.

If we’re looking top down at sectors in the US stock market, Finance and Healthcare are the biggest markets with the highest margins.

The next most attractive sectors are energy, utilities, basic industry, transportation. Since energy and industrials tend to have higher margins, and utilities the lowest margin, you might consider focusing on energy and industrials.

Digging further into CBInsights data on both unicorn startups and AI startups, both have strong vertical representation from fintech and healthcare.

This is a good example where the data are all aligned -- fintech and healthcare are the largest markets with the highest margins and the most representation among both unicorns and AI startups. So these are solid markets to aim at.


Are there already a lot of other smart people working on this who are probably already the winning cohort? Given the massive investment in autonomous vehicles lately, and the fact that the size of that market is a bit smaller, you might instead consider focusing on a market like energy.

If we look at fintech unicorn cohorts, we see that most of the action has been in lending and payments, which have historically fallen mostly under the traditional banking industry. Insurance is about ⅓ the size of banking in the public markets, but only ⅕ the aggregate valuation and number of startups on the unicorn list.

Screen Shot 2017-03-07 at 6.20.02 PM.png

 Total US Market Cap by Industry in the Finance Sector

As another example, consider pharma R&D process. Many AI pharma startups focus on finding new candidate compounds that they can sell to pharma companies. This is a sane strategy, because it avoids the $2.9B, 10+ year, and < 10% success rate process to bring that new drug to market. It also surely feels motivating to work on finding new compounds that may help to treat something like cancer, but it leaves open whitespace downstream in the process, where the big money and the big bottlenecks are. Thought arguably more of a ‘shallow tech’ a ‘deep tech’ company, Science 37 example of a clinicals venture that is really innovating on the fundamental model for running trails.




The right idea with the right team at the wrong time == the wrong idea.

Remember that the non-consumer stuff is likely to come in a big cohort of exits rather than a single outlier. As yourself if you are the only one who sees this opportunity in the market right now. If so, that may not be a good thing. You want the customers within your target vertical to have immediate unmet needs and VCs scouting that vertical ready to invest.

Nobody cares about your idea, they care about their needs. Even when it comes to their own needs, they can only focus on a few needs at a time. So they really only care about the few most timely needs this year. Are you focusing on an issue that is one of the top few needs of the year within your target industry?

One of my favorite descriptions of the importance of timing is laid out in a TED talk by Bill Gross.

He describes how, of the five factors he explored across 100 Idealab startups, and 100 non-Idealab startups, timing was the dominant factor driving success. He gives a couple great examples; Uber and Airbnb ere both perfectly timed during a recession, and people needed the extra money. IDealab started in the 1999-2000 period, when broadband penetration was too low and streaming video in the browser was janky.  Two years later broadband was over 50% penetration and adobe flash fixed the browser issue, and Youtube was perfectly timed.

Look at the market and be really honest with yourself about whether the consumers/business you are targeting are really ready for what you have to offer them.


My claim is that Vertical AI startups are inherently defensible. According to the four-factor definition above; AI Startups build full stack products, have subject matter expertise in their vertical, gather proprietary data, and use AI to deliver the core value of their product. Each of the four core components of a Vertical AI business makes it more defensible.

Full stack products: The more complex it is to create  the experience, the more defensible the business.

Subject matter expertise: The harder it is to assemble the mixed team and set the company joint-DNA early on, the more defensible the business. 

Proprietary data: The harder it is to gather your data, and the more its intertwined with the product and go to market strategy, the more defensible the business.

AI delivers core value: The more AI delivers the product's core value by unlocking a totally new opportunity through rich domain modeling within the vertical and models built on top of proprietary data gathered via the product itself, the more defensible the business. 

Have fun exploring

If you’re interested in vertical AI startups, I encourage you to follow the process outlined above for selecting opportunities based on market, whitespace, timing, and defensibility. In a recent talk at mlprague, I laid out a number of different examples that I find interesting.

Vertical AI has been the exclusive focus of my career; in financial services since 2002, as a startup founder since 2009, and as a founding partner of DCVC since 2011. I started Flightcaster in 2009, which seems to be the first AI startup in YCombinator, Prismatic in 2012, which linkedin acquired in 2016, and Merlon in 2016, which grew to $Ms in revenue in its first year powering financial crimes compliance for global banks. We may be in an AI startup hype cycle now, but i’ve been doing this stuff for 15 years and will continue doing it long after the current hype cycle has subsided.

If you’re working on a vertical AI startup and you’d like to work together or pitch me an investment opportunity, ping me on twitter or connect on linkedin.