Google research paper describes an algorithm that can identify low quality webpages, similar to what the helpful content signal does
Algorithm features low resource usage and the ability to handle web-scale analysis
Algorithm does not have to be trained to find specific kinds of low quality content, it can learn by itself
Nobody outside of Google can say with certainty that this research paper is the basis of the helpful content signal.
Google generally does not identify the underlying technology of its various algorithms such as the Penguin, Panda or SpamBrain algorithms.
So one can’t say with certainty that this algorithm is the helpful content algorithm, one can only speculate and offer an opinion about it.
But it’s worth a look because the similarities are eye opening.
Google has provided a number of clues about the helpful content signal but there is still a lot of speculation about what it really is.
The first clues were in a December 6, 2022 tweet announcing the first helpful content update.
The tweet said:
A classifier, in machine learning, is something that categorizes data (is it this or is it that?).
The Helpful Content algorithm, according to Google’s explainer (What creators should know about Google’s August 2022 helpful content update), is not a spam action or a manual action.
The helpful content update explainer says that the helpful content algorithm is a signal used to rank content
The interesting thing is that the helpful content signal (apparently) checks if the content was created by people.
The concept of content being “by people” is repeated three times in the announcement, apparently indicating that it’s a quality of the helpful content signal.
And if it’s not written “by people” then it’s machine-generated, which is an important consideration because the algorithm discussed here is related to the detection of machine-generated content.
Lastly, Google’s blog announcement seems to indicate that the Helpful Content Update isn’t just one thing, like a single algorithm.
Danny Sullivan writes that it’s a “series of improvements” which, if I’m not reading too much into it, means that it’s not just one algorithm or system but several that together accomplish the task of weeding out unhelpful content.