Semantic Keywords (or LSI) are words that follow the same conceptual ideas as a core keyword. Using them helps you to create content that maintains expected discourse and so improves SEO and UX.

This article looks at what semantic keywords are, how to find them, and how to place them in your content.

What Are Semantic Keywords?

Semantic keywords are unique words and phrases that have some kind of conceptual link to your main, core keyword. They’ve had a lot of names over the years, but the marketing (SEO) community has generally settled on LSI (Latent Semantic Indexing).

What does that mean?

It’s really just a case of semantics.

Semantics: the branch of linguistics and logic concerned with meaning.

LSI is where you take the language (discourse) which surrounds a piece of content you’re creating, and you make sure that your piece of content fits.

Imagine you’re going to a party. It’s a fancy dress party, the theme being animals. You show up in a tux with a rose red bowtie. You’re not going to fit in. Depending on the party, you may get thrown out. At best, you’re going to be a seen as an oddball.

In an age where search engines are constantly improving their algorythms for judging relevancy, you don’t want to be an oddball (too much).

Let’s take a look at a linguistic example. Which of the words below does not fit.

Ice cream
Yogurt
Cheese
Orange juice

If you picked ice cream… you would be wrong.

You would be right if you picked “Orange Juice”. But why?

On a textual level, the words are all as different and similar as one another. Ice cream also has two words, so it can’t be that. Yogurt also has a “U”, so it can’t be that. The difference is the concept. Orange juice simply isn’t a dairy product.

Semantic keywords would be all of the words apart from orange juice… if we were talking about dairy products.

And that leads in to the second thing about semantic keywords, you need to have a frame of reference.

Framing Is Key to Semantics

In order for semantic keywords to make sense, you need a frame of reference.

For most digital articles, this frame of reference is going to be your core keyword.

Let’s say that you’re writing a piece of content about hamsters (I don’t know why, hamsters just come into my head a lot).

Well, stage one of your semantic hamster research is going to mean narrowing down your frame of reference to a particular article.

Take a look at finding content opportunities through comment analysis if you’re stuck for where to go here, or simply use a keyword tool. Either way, you’re going to need to find a specific article idea first.

Once you’ve frame your article, you’re ready to get started.

To keep things simple, for the rest of this article, we’re going to work on identifying the semantic keywords for the keyword (phrase) “are hamsters good pets”. This is the article we want to write and rank for.

Find Your First Semantic Keywords

What better way to get started with semantic keywords than by finding your own!

There are four primary avenues for finding semantic keywords:

Google Search
Wikipedia
Reddit
Social Media

Google Search

This section is going to talk about creating a corpus from a range of different texts related to your targeted keyword. The idea here is that we want to understand the language “network” that search engines like Google, and readers, expect.

To begin, we’re going to create a corpus from the page one Google search results. We want to keep our corpora separate during this stage as each offers a different type of data.

If you’re like me, maybe you have a program that will automatically scrape Google’s data (to be released). If not, then you’re going to have to do this manually. Simple search for your keyword, click through to each page and copy paste into a notepad document.

Note: Before you do this, launch into some keyword research for your main term. If it’s not a high volume, high difficulty term, you can probably just create a really great piece of relevant content, and this is an unneeded step.

Once you’ve done that, it’s time to boot up your favorite corpus tool and get to analyzing the discourse around it.

To start, we’re going to take a basic word list.

All of the words your find here have value. But it’s also useful to assign some kind of statistical significance to help filter out the data. In most cases, I will set a 5% frequency cutoff. In other words, the word needs to appear in more than 5% of the corpus. This may seem like a lot, but it really helps to sort the most important semantic keywords for SEO, from the less important.

You can continue to reduce this threshold if you find that it doesn’t provide enough semantic keywords.

Refining Discourse

Relevant content from sites like Wikipedia and Reddit, and social media, should be put into a seperate corpus.

These provide insight into how your audience speaks on the topic – it provides deeper insight into the discourse. As we discussed with Coca Cola, this is vital to creating change and inspiring action.

When selecting texts to take information from, don’t just lump in everything you can find. Take a look at the content and see if it aligns with your brand’s own values.

Once you’ve collected the corpus, create and perform the same semantic keyword research as before.

Taking Semantic Keywords Deeper

Once you’ve created a list of semantic keywords, it’s actually possible to take this even further and create a network of the language situated around high-performing content that focuses on your keyword.

A lot of SEO strategists do this without realizing (and in a somewhat haphazard way). We’re going to inform our strategy with data.

Using Collocates

Once you’ve collected your keywords and have decided you’re really going to take things to the next level, it’s time to start working with collocates.

Collocates tells you which words occur together most frequently, based on a span set by you, the user. A span is the number of words to either side of your primary word.

Collocates can get really complicated really quickly, so I highly recommend limiting your span and the number of words you analyze when doing this. If you do it too much, you’re also in danger of receiving the dreaded duplicate content penalty.

To begin working with collcates, take your previous corpus and stick it back in a tool that allows for an analysis of collocates.

Load up the collocate section and set the span to 3L and 3R. Type in a keyword you want to look at, I’m going to pick “hamster”, and hit go.

Here you’ll be presented with a bunch of collocates. These are going to have relatively low volume, but in some cases, they are going to be useful. You can then click on them to learn more.

“Slimming” is interesting. Why is that there? Let’s click on it and find out why:

What to do when your hamster is fat

Slimming down your hamster will be to his benefit…

So it’s more a question and answer about hamster care. Care is obviously an important quality to articles about why hamsters are good pets. This also gives us an idea for a further article, as well as for how we should talk about the current article.

There are a bunch of other collocates we can take a look at here. Collect the most frequent collocates and add them to your semantic keyword list.

Putting Your Network into Play

The final step is putting your semantic keywords into play.

Semantic keywords are very different from core keywords.

Firstly, there should be a lot more of them. Secondly, they are much more in tune with user experience. Thirdly, they are going to potentially need to be bunched together (collocates).

So when putting semantic keywords into your text, keep in mind the following rules:

Evenly distribute them throughout your text (with the exception of collocates).
Understand the deeper connections (collocates)
Provide them in association with genuine, useful content

Conclusion

So there you have it. This article has outlined how you can easily create content that matches search engine and reader expectations.

As an added bonus, because you took a look at the way the community talks about your products through Reddit and social media, you also have access to the discourse surrounding the product. This means it’s just as easy to create content that emotionally resonates.

…Just make sure to add your own rheotrical spin!

Sometimes it’s useful to take a huge sample of tweets that include a specific keyword or hashtag. This can help marketers to identify social trends, content opportunities, and areas to enter the conversation.

While there are several tools available to provide a general analysis of social trends, few give you access to the full data set. This script offers a free way to scrape as many tweets as you require related to a single keyword or hashtag and then insert them into a corpus for analysis.

In order for the below code to work, you will need to have installed Python. After you opened Python in the command console, you will also need to run the command:

credentialsexport TWITTER=""

Once you’ve done this, you can then run a python file that contains the following code:

from nltk.twitter import Twitter 
tw = Twitter() 
 
print (""" ==================================================== \n Twitter Keyword Live Stream\n ==================================================== """) 
 
print ("Enter your Keyword")
keyword = raw_input(">") 
print ("Set Number of Tweets for corpus") 
f_setsize = raw_input(">") 
setsize = int(f_setsize) 
tw.tweets(keywords=keyword, stream=False, limit=setsize) 
 
print (""" ==================================================== \n FINISHED.""")
 
print (""" ==================================================== """) print (A_Stream)

Follow the commands and then copy the output into a notepad document for analysis in your favorite corpus tool (I’ll forever recommend AntConc).

Google offers digital marketers a huge number of resources to get your marketing strategy off the ground. Not just in its marketing apps, but in the SERPs (Search Engine Results Pages) themselves.

This is a quick guide on how to use SERPs to create better link-building content that has a higher chance of ranking – before even stepping foot in a keyword tool.

A Breakdown of a SERP

SERPs are no longer the simple, homogeneous body they used to be. Now, there are over 20 different sections which can pop-up depending on how Google defines an individual search query. A couple of examples of these different sections include:

The Search Bar Itself
A Short Informational Summary
More Detailed Information
Top Stories
Shopping Results
The Actual Search Results
Searches Related To…

The sections that appear are based largely on the semantic attributes Google associates with your query. At the time of writing, Google categorizes search results as either Informational, Navigational, or Transactional. It does this through a linguistic analysis of what you’ve written. For instance:

“What is Valentine’s Day” = Informational
“Where to Buy the Best Valentine’s Day Gifts?” = Navigational
“Valentine’s Day Gifts” = Transactional

Particular phrases indicate particular requirements on the part of the searcher. As Google search has become increasingly powerful, so too has its ability to deliver content that is progressively more relevant.

This not only provides an incentive for website owners to create better, more relevant content, it also provides searchers with an easier job of finding something.

This entire process relies on a linguistic analysis of search queries and on-page content.

Getting Started With Semantic Optimization

To get started with semantic optimization, you’re going to need to open a relevant search. Let’s say we’re looking to promote a valentine’s day product, we would type “Valentine’s day gifts”.

Ignore everything on the top and scroll straight to the “Searches Related To” section at the bottom of the page. Here, you can find frequently searched terms which are also related to your primary search query above.

The first four results that appear are:

valentine’s day gifts for her
sentimental valentine’s day gifts for girlfriend
non cheesy valentine’s day gifts for him
romantic valentines gifts for her

Immediately, several key words and phrases can be seen: “for her”, “for him”, “sentimental”, “non-cheesy”, and “romantic”.

Confirming with the Search Bar

Your next port of call is the search bar. Start typing in a relevant query and see what comes up as you type.

For instance, if I start with “best valentine”, I’m given a lot of entirely new
results , but I’m also given some reoccurring search phrases as well. “best valentines gifts for him” and “best valentines gifts for her” seem to be very popular, so I’m going to move forward with those.

Repeating this method with different search terms I’ve gathered from the original SERP can also net me some new ideas and language to play with.

I find that a good way to organize this data is by marking down each search term and then giving it a tick for each time it or a similar search term appears. Eventually, you should end up with several which are marked much higher. These will become your primary contenders for on-page optimization.

Supporting Keyword Strategies with Other Software

Once you’ve narrowed down a list of potential suspects, you’re going to want to use third-party software to confirm them. I tend to use KW finder as it provides data on the difficulty of becoming strong in a particular keyword or phrase. They also currently offer 5 searches per day for free. Here’s what a search result will look like:

What we mainly want to look at is the suggestions page and the DIFF (difficulty) score. The lower the score, the better the search term, the higher the score, the worse. 30 is an ok score, but something lower would be even better.

We also want to look at the search column. This provides us with data on the search volume. We want to find something which lands on a happy medium.

Rinse and repeat with the other search terms you have and rank according to the power behind them (search volume and ranking difficulty).

Once you start creating your content, take the most powerful keywords and place them as the most prominent (no keyword stuffing). Try to work in as many of them as you can organically (again, no keyword stuffing), and you’ll start to see your content rank quickly.

The Role of Schema

I’m going to talk about schema briefly as it highly relates to the pragmatics problem associated with SEO.

Written word, unlike verbal communication, lacks some of the contextual clues that come with an in-person conversation. Medium, sound, body language, etiquette: all of these things contribute to meaning but are unfortunately absent in written communication.

Schema takes a step towards bringing this gap by providing a revealing co-text, as opposed to context. This co-text is coded behind the content.

Marketers interested in understanding the purpose of schema should explore the history of linguistic tagging to see how it can benefit their organic results.

The two fundamental positionings of Identity and Journey are the basis for creating powerful user experiences with content, that enable the change of a brand’s position and power within its industry. They provide all the information a marketer needs to know where to get started.

Identity

A brand’s identity highlights the language and discourse that surround it. It brings to light the ideals, beliefs, and attitudes of its audience. With this, brands are able to leverage the most significant and accessible features of a reader to convince them to convert.

A powerful brand doesn’t just access and adopt the ideals, beliefs, and attitudes of its audience, it influences them through shifting an individual’s relationship with both their own and the brand’s identity. By doing this, a brand is able to create a stable relationship that pushes towards sustainable conversion.

Journey

A journey starts at nothing and moves to something. Every brand has its own journey; regardless of how similar they are. Journey is intertwined with the elements of identity that make its ability to sell so powerful.

To keep things simple, identity can be broken down into the three fundamental stages of a sales funnel: awareness, consideration, decision. Each of these stages defines a different user mindset. The further a user progresses, the more susceptible they are to shift.

A user’s susceptibility to shift is important in understanding how a journey and its discourse can be influenced. As, when you influence discourse, you will be able to reposition your brand within your industry. That is to say: changing the way your brand is talked about changes how your brand is viewed and how significant to the individual your brand’s journey and identity are.

The Shift

The Shift comes from horizontal movement across a customer’s journey, with each “column” embodying a different set of ideals which slightly change from the previous one. A good example of the shift can be seen in the travel industry.

Audience	Shift 1	Shift 2	Shift 3	Ideal
travel	travel	travel	travel	travel
Asia	Asia	Asia	Asia	Asia
planning	backpacking	backpacking	backpacking	backpacking
island	island	eat	eat	eat
Bangkok	Bangkok	Bangkok	mountains	mountains
need	need	need	need	best

Instead of starting with our ideal landscape, analyzing what an audience thinks allows us to access the discourse that already exists. We then line this up with our ideal result and begin the process of shift by slowly exchanging concepts where we can. This process is not 100% foolproof and like anything in marketing, trial and error (with collection of results) are integral to its success.

There are situations where it’s important to align your own marketing voice with that of another brand. Perhaps you are a reseller of their product or services, or perhaps you’re a simply trying to compete in a market with a niche audience.

Purpose:
To align your organization’s voice with that of another brand.
Toolset:
AntConc, web scrapper / web browser

Sources of Voice

There are three primary sources you can take your information from for constructing a clear voice. These include the target brand’s:Internal marketingWebsiteSocial content

The first of these is going to be the most definitive resource for promoting a product – if it exists and you are given access to it. Otherwise, the second two options are a suitable alternative.

Statistical Significance

Not all the data you find is going to merit mentioning in a brand voice guide. In fact, the bulk of it is going to be useless. In order to differentiate between what you should adopt and what you shouldn’t, you’re going to need to have a basic grasp of statistical significance.

Here, statistical significance means that a word must feature at least 5% of the time. That means that if you have a collection of 4,000 words, it should exist at least 20 times. 5% might seem a lot but actually isn’t. If an organization has a clear brand voice, core language should be consistent throughout their content.

Collecting a Lexicon

Either use a web scrapper or copy & paste the brand’s website into a text (.txt) document. To avoid your data from being skewed, go through the content and delete any boilerplate. This includes any copy that is featured throughout the website (menus, footers, headers, etc). Once you have all the website’s content in a text file, you can then load it into a corpus analysis tool. Perform a word count analysis.

Find word’s that have statistical significance and sort them into the categories of Noun, Verb, and Adjective. If there is a word that may fit into two of these, check its concordance lines to see how the word is being used.

If you decide to do this with a brand’s social content, bear in mind that a larger corpus is needed. Most web pages contain less than 500 words, meaning that core pages from a brand’s website tend to total less than 10,000 words. A social corpus should contain at least twice that.

The Statement and the Sentence

The statement is a three or four word phrase that embodies the language and phrases that a company uses. It can serve as a central crux to a brand voice and is helpful for content writers to get a feel before delving deeper.

You can find the statement by taking a noun, an adjective, and a verb, and putting them together in a way that makes sense. To really take this to the next step, go back to AntConc and check for collocations. This will help you to build a clearer idea of when and how the target brand uses different words together.

The sentence is like the statement but longer. In the sentence, try to work in as much of the lexicon as possible to create one, long example of the brand’s voice.

Market Segmentation is one of the most important parts of putting together a content strategy, along with personas. Who is buying your product or using your service? Where do they come from and how can you best market yourself to them?

Without market segmentation, you’re like a fisherman casting your net at the beach and coming up with nothing 99% of the time – except that one time you got lucky.

But how do you put together market segments? If you’re an established business, then you’ve probably got a lot of data on past customers. Geographic, Demographic, and lifestyle segmentation can easily be achieved with that data, but what about Benefit Segmentation?

Benefit Segmentation is the division of consumers based on their wants and needs. The question you should be asking yourself is: how does my customer benefit from my product?

Here, I’ll offer a simple technique for using comments to construct benefit segments yourself. Whilst this can be repeated with any type of business, I’m using a sample from an internal travel blog. Here, the question I need to ask myself is “What is my audience getting out of the content I create?”

Why Don’t I Just Read the Comments Myself?

You’re probably wondering why you can’t just do this yourself? Read through a couple of comments, look at what people are saying, and go ahead and use that as your rationale. Well…

You can. There’s nothing stopping you from doing that. However, if you’re looking to put together an airtight market segmentation strategy based on data, you’re going to need quantity, and reading through 1,000 or 10,000 different comments just isn’t a productive use of your time – especially if you can automate most of the process in the first place.

Content Collection

As you have to do with any data-based research task, you’ll need to start by collecting and organizing your comments. It’s up to you how you do this, but you’re going to want to end up with a .txt document.

As a small example, I’m going to use comments from a travel blogging site I’ve worked on, and use comments located on the site’s ‘Asia’ section. In total, I’ve collected 127 comments from 5 different articles, with a total of 7,059 words.

When you come to performing a comment analysis yourself, it’s a good idea to separate different sections of your website into different text documents. Each of these will serve as a base for your analysis, with the rationale that different people come to your site for different reasons and so visit different sections. Whilst you can divide your comments in any way you wish, make sure that your system makes sense.

Removing Comment Boilerplate

Before we get to the analysis itself, we’re going to start by removing any boilerplate from our collection phase. This includes things like dates and times. We also want to remove any of our own comments and replies from the data – we already know what we sound like (unless we’re new and performing an audit).

The easiest way to do this is with the Microsoft Word Replace tool and some careful scrolling and deleting. If you have a lot of data that you’ve collected and you can’t be bothered to trawl through it all, you can use TextCrawler to make changes a little faster. Just be careful that you don’t delete data which you actually want.

For my content, I wanted to delete responses from the site owner, dates, times, and the ‘REPLY’ button text which came with each comment. There were also several names of commenters which I don’t necessarily need, but which would have been too time-consuming to delete.

In the end, the structure of each of my individual comments went from this:

[Name] [Date] [Time] [Comment] [REPLY]

To this:

[Name] [Comment]

Analysis

Populous Keywords

Popping the ‘Asia.txt’ file into AntConc, I’m going to start by looking at the frequency list and seeing if I can spot anything interesting.

Firstly, look for any topics which have been mentioned a lot. The sample I’m using is to small to create any real conclusions, but if you’re dealing with comments from over 200 different posts, you’re going to start seeing patterns emerge. In this small article, I’m seeing a lot of mentions of ‘border’. This is probably because one of the articles selected is about border crossing. I’m also seeing frequent mentions of particular destinations – the destinations which the articles were about.

Finally, I’m noticing that ‘Visa’ is being mentioned a lot. None of the articles I’ve selected were exclusively about Visa’s – although the border crossing article would concern it. The frequency of ‘Visa’ may mean that a lot of commenters want to know more about visa issues and processes. We’ll explore this later.

Looking Deeper

As we’re dealing with comments, we’re particularly interested in finding personal pronouns which refer to the author of the comment. This includes “I” and “we”.

With these, we’re going to want to take a deeper look. If you’re using Antconc, you can click on a particular word and it will open up concordance lines (basically, lines of text which include your selected word).

Selecting “we”, I’m greeted by a lot of content about similar experiences which readers have had or plan to have:

We are looking to be there late January…
we arrived at the border crossing…

And some about how the reader had discovered something new in the post, which they didn’t know about when they visited.

we could have enjoyed it.
we didn’t know enough about this city…
We didn’t stop in…
we were not interested in tubing, but…

We can then go back to the comments and look at the context within which these statements fall. From that we can gather the full extent of the comment’s meaning.

For instance, the content of ‘we didn’t know enough about this city’ was within a comment praising the article for its detailed information. This means that there is an opportunity to create even more detailed content about that city – especially if multiple commenters have mentioned similar things. Moreover, if several commenters have stated that they plan on visiting a location, we can assume that further content related to that topic would be useful.

Looking back at the ‘visa’ keyword, a deeper look at concordance lines reveals a lot of advice on how to obtain a visa:

it is possible to get the visa on arrival
we were exempted from a visa
visa situation is getting stricter

This definitely affords is a future content opportunity, as it is something which hasn’t been covered in enough detail in the original article.

Analyzing Spread

The final step of this analysis is analyzing spread. It’s great being able to identify new content opportunities through frequency, but frequency doesn’t mean a lot if it’s all located in one article. Consequently, we’re going to have a look at the concordance plot to see how our keywords are dispersed throughout the comments.

Firstly, looking at the keyword ‘visa’, you’ll notice that it is dispersed throughout three major areas. This indicates that visas are actually a common issue with the Asia section of the blog, and a topic which this audience desires more information about.

With ‘border’, we’re also seeing a fairly even spread across the second half of the corpus. However, as one of our articles is about a border crossing, it might not be as useful a topic to create new content on.

What About Everyone Else?

You might be aware of one glaringly obvious problem at this point… What about the people who don’t comment?

Well, I never said that this is a 100% airtight and exclusive technique for ensuring you’re capturing the right market segments. It is, however, a way to capture those who are likely passionate about your content. After all, it takes a lot more effort to leave a comment on something than just click away (unless it’s spam).

Market Segmentation is Plural

It’s important to remember that there are multiple ways of ‘splitting a market’. This is just one, and the segmentation itself largely relies on having clear-cut content topics in place. This analysis should help you to find opportunities.