By Rebecca Bakken published October 5, 2017

How Topic Modeling Can Strengthen Your SEO and Content Marketing Strategy


Content marketing lies at the intersection of art and technology, so it’s only logical that artificial intelligence comes into play. AI plays a big role in search and it can be useful in creating content, too.

While it’s not known exactly how the Google algorithm works, researchers and marketers have conducted experiments that make a strong argument for topic modeling as a key component of the Hummingbird algorithm. Also, Wired reported last year about how Google’s search engine now uses artificial intelligence to rank content.

Google’s search engine now uses #artificialintelligence to rank #content via @wired. Click To Tweet

A basic understanding of how search engines interpret text and its quality can help you develop a strategy that produces consistently high-ranking content because topic modeling – done manually or with software – allows you to create blueprints for the best possible content.

In this post, I explore the concept of topic modeling and how it relates to search algorithms. Then, I’ll go over some key strategies on how to develop your content to include all relevant topics, and how you can take advantage of topic model predictions to outperform the competition.

First, let’s dive into what exactly topic modeling is and how search algorithms can use it to rank the billions of pages on the web.

What is topic modeling?

Words and phrases are interrelated, and the idea behind topic modeling is to discover those relationships. When crawling your content, search engines aren’t just crawling for specific words, they’re seeking words with related meaning. This makes keywords, keyword variants, and related topics an integral part of creating contextually rich content. The placement and frequency of each keyword should correlate with how important that term is to your piece.

Search engines don’t just crawl for specific words. They seek words w/ related meaning. @rakkenbakken #SEO Click To Tweet
  • Focus topic – Your main topic (i.e., your target keyword) as well as its variants should be mentioned in the headline, first paragraph, and as frequently as possible while sounding natural.
  • Related topics – Each subject most commonly associated with your main topic should have its own subheading within your content to ensure that it’s discussed appropriately.
  • Secondary related topics – Somewhat tangential to your focus topic, if your competitors are discussing these related topics, you should too, to improve your chances of ranking. (If you find a secondary related topic is more important than the competition deems it, take that opportunity to differentiate your content and cover it more deeply than the top-ranking pages.)

When thinking about which topics to cover in your content, it helps to think in terms of searcher intent. What questions could your reader possibly have about this topic, and how can you answer them comprehensively? When you achieve this, your content signals to Google’s AI that you’re the authority on that subject – a major factor in ranking.

A note on keyword tools versus topic modeling: It’s true many tools can generate lists of “related keywords,” but there’s a difference in the technology behind keyword tools and topic-modeling software. Keyword tools primarily look at topics that appear together in high-ranking pages and group them together. They don’t look at the number of co-occurrences between terms, semantic relevance, and other signals, so they’re not necessarily informing on the degree to which topics are related. Conversely, AI software can perform this calculation and rank related topics by importance, while taking other on-page factors into account as well.

How to manually create a topic model

Let’s walk through the steps you would follow to manually implement topic modelling for your website.

1. Inventory each page on your site.

To stay organized, keep a spreadsheet. Every URL should have a row, and columns should indicate these points:

  • Focus topic
  • Quality and comprehensiveness
  • User intent bucket (e.g., evaluate, compare, purchase)
  • Need for improvement (yes/no)

(Sorting in spreadsheets requires uniform inputs. Identify your target focus topics, a scale for quality, and user intents ahead of time.)

2. Identify gaps in topic coverage.

Once you’ve done this for each page, look at your focus topic column and see which topics are covered in depth, and which have gaps. For the topics with gaps, plan to create new content to cover them.

3. Prioritize pages to update.

For pages you’ve identified that need improvement, identify the subjects related to your focus topic that need to be added. URLs with the most shallow topic coverage should take priority, particularly if their focus topic is important to your business.

The rest is a matter of execution for your content team. You can help them by creating blueprints that detail the ideal word count (based on pages ranking for your focus topic), related topics, and user intent profile for each post. For any piece of content you create, follow this basic framework:

On-page factors

Each page on your site should have a framework that signals to search engines what topics your site covers. These on-page signals are important for SEO, and can provide guidance to make your content creation process more efficient:

Framework for every site page should signal to search engines what topics are addressed. @rakkenbakken #SEO Click To Tweet
  1. Page title or headline – What is your page about? Be clear, succinct, and specific when choosing your titles (but also remember to keep it lively and attention-grabbing).
  2. Lead or introductory paragraph – Your readers are impatient, and search engines give weight to your first paragraph. Get to the point in your first sentences, and be sure to include your focus topic for that page.
  3. Headings and subheadings – Those H1 and H2 tags in your content are more than formatting codes. Search engines crawl your headings and subheadings to derive meaning from your content.
  4. LinkingInternal and external links, as well as their anchor text, are strong signals that tell Google what you’re writing about and the associated content that you value.

A less visible factor to consider is whether your content answers your users’ questions. This is key as search engines favor content that directly responds to queries. Think about the questions your visitors might be asking and provide a thorough answer.

Writing content that speaks to searcher intent thoroughly will get you far, but it really helps to have some tools at your disposal.

Automate the process

Advanced artificial intelligence platforms such as MarketMuse (disclosure: a client and a platform that I use successfully) are available to find topical gaps in your content, identify related topics, and generate outlines and keyword lists to guide you to topical authority. Creating and updating content is much easier when you give your team clear-cut, data-based recommendations in an outline. Alternatively, you could pull organic keyword data from SEMrush and use Excel to filter and analyze it to give you a list of terms for which top content is ranking (i.e., topics you should cover with your own content).

AI solutions can remove a lot of the guesswork when deciding which topics to cover, and because they use technology similarly to the way search engines do, they provide data that produces predictably positive results.

Many content marketers will be satisfied with the information already covered. But if you’re one of those next-level, data-obsessed marketers, read on.

Going more in depth for data geeks

There are several different types of topic models, and, for most marketers, it’s not terribly important to know the minutiae of each one. In general, a topic model is a text-mining method that determines the relevance within a body of text. Topic modeling allows algorithms to analyze vast amounts of web content, assigning topical relevancy to each page and ranking it efficiently and accurately with each query.

A topic model is a text-mining method that determines the relevance within a body of text, says @rakkenbakken. Click To Tweet

Additionally, it allows for advanced content analysis and search engine optimization software to identify topical gaps in content. When you understand how topic modeling works, you’re better able to plan your content and effectively use the tools at your disposal.

One of the more common topic models for identifying topical probability is latent Dirichlet allocation (LDA), a statistical model in natural language processing. An LDA model views individual documents within a corpus – or, in search terms, pages within a site – and determines the relevancy of each page to a topic, assigning a percentage for topics mentioned. Here’s a simplified example:

Page 1: There are many delicious vegetarian dishes.

Page 2: A vegetarian diet contributes to a healthy lifestyle.

Page 3: Julia Child was one of the first TV chefs.

Page 4: Julia Child cooked many types of vegetarian dishes.


Pages 1 and 2: 100% topic A

Pages 3: 100% topic B

Page 4: 50% topic A and 50% topic B

Using an LDA model, a software platform would be able to infer that the site as a whole is about vegetarian cooking, which also discusses some famous chefs who cook vegetarian recipes. One characteristic of LDA modeling is that a topic isn’t defined as a single thing, but rather a group of related terms.

For instance, topic A in this example would include vegetarian dishes, vegetarian cooking, vegetables, and so on. Topic B would include Julia Child, famous chefs, TV chefs, etc. Having groups of terms within each topic means that search engines can more intuitively pull pages related to a query, even if the person did not search for that exact term, giving the searcher the greatest number of relevant results.

LDA isn’t the only type of topic modeling, however. If you’re interested in taking a deep dive, here are some other topic models to consider:

Additionally, databases such as Google’s Knowledge Graph and techniques such as TF-IDF (term frequency and inverse document frequency; a much more simple method than LDA), co-occurrences, and co-citations all use topic modeling to determine semantic meaning of a body of text.

It’s not necessary to have an intimate knowledge of topic modeling to create a successful content marketing strategy. But knowing how search algorithms categorize pages is useful when deciding which topics to cover, the pages on which they should be mentioned, and how frequently you should write about your core and secondary topics.


Topic modeling plays a significant role in planning and search, in both obvious and subtle ways. Now that you have a basic understanding of how topic models work and how they’re applied to content, it should help you plan and organize your content in a way that puts topics – not keywords – at the forefront of your content marketing strategy. 

Please note: All tools included in our blog posts are suggested by authors, not the CMI editorial team. No one post can provide all relevant tools in the space. Feel free to include additional tools in the comments (from your company or ones that you have used).

Sign up for our weekly Content Strategy for Marketers e-newsletter, which features exclusive stories and insights from CMI Chief Content Adviser Robert Rose.

Cover image by Wilfred Ivan, StockSnap

Author: Rebecca Bakken

Rebecca Bakken is a Boston-based freelance writer and editor, specializing in SEO and tech. She has a background in journalism, and has managed content teams in both large agencies and small startups. She's the owner of R.M. Bakken Editorial, LLC and her website is Connect with Rebecca on LinkedIn or follow her on Twitter @rakkenbakken.

Other posts by Rebecca Bakken

Join Over 200,000 of your Peers!

Get daily articles and news delivered to your email inbox and get CMI’s exclusive e-book Get Inspired: 75 (More) Content Marketing Examples FREE!

  • Eric Van Buskirk

    Peeps, get on board w/ Google’s AI, or loose & just aim to rank w/ Yahoo! Awesome, Rebecca!

    • Rebecca Bakken

      Thanks, Eric!