I started writing this shortly before I became enamored of the use of vectors in the analysis of text, which greatly affected how I saw machine learning as a process. Coming back to it nearly two years later, I see little sense in continuing along the same vein. While I don’t think I was completely out in left field, my digging into such things as TensorFlow has definitely led me down a different path.
So I’m putting this out there, and I’ll be following up shortly with a second part, dealing with vectors and how they may be in play. In the meantime, here’s something to think about:
Disclaimer: This post is admittedly based upon a high degree of conjecture, and my reliance on my sexagenarian memory. Expect inaccuracies – this is just a thought exercise.
To begin with, you can expect a fair amount of dissension about this post, for at least a couple of good reasons:
First of all, exactly what Google’s algorithms can do and how well they can do it (especially the latter) are based largely upon conjecture. My take on it is no different in that regard… I’m not embedded in the upper echelon of Google’s technocracy, so I can only base my opinion on what I’ve observed. From that, I (try to) make a reasonable leap of logic. So anyone using a title like the above is likely to attract a few folks, eager to dissent. Hopefully, most folks in this business know that algorithms don’t actually “think”.
Second, not everyone in our industry really tests theories. Most, being too busy (or otherwise disinclined) to test for themselves, just accept the results offered by others. Consequently, “findings”, whether valid or not, are accepted and repeated, often based upon how reasonable they seem to be (or how well they fit someone’s preconceived notions). All too often, more emphasis seems to be placed upon how often they’re repeated and by whom. Consequently, folks may or may not be interested in hearing what crackpot notion I’m proposing. Understandable.
Finally, since this post is a bit of a ramble, based solely upon my impressions to date, without doing any research while I’m writing it, I’ll undoubtedly forget some of the stuff I’ve read. I may even make some misstatements. Feel free to correct me in the comments. I can certainly afford to learn more and I’m sure there are others reading that can, too.
What’s an Algorithm?
Since semantics will inevitably be mentioned here, let’s do it now. Semantically, the title is totally misleading – algorithms don’t think… they calculate, which is vastly different. So for the sake of clarity, let’s begin by defining what an algorithm really is.
Merriam-Webster defines it as: a step-by-step procedure for solving a problem or accomplishing some end, especially by a computer. I don’t think that’s quite as clear as it could be, although those that have any knowledge of coding would understand it intuitively. For others, I’d add that it’s simply a mathematical process.
Preset thresholds and true/false states are very basic conditions that an algo may recognize and act upon. But algorithms like Google’s ranking algo are exponentially more complex. They can vary their then actions based upon degrees of if. And those degrees may be affected by the state or degree of any number of other elements. But they’re still no more than mathematical formulae.
Probabilities enter into most algorithmic analyses, but Google has taken this to a new level with their learning algorithms. The presets are now fluid, and can be self-adjusted, based upon previous findings. With each iteration, the algo becomes a little “smarter”… and presumably, more accurate. That ups the game considerably.
How do Algorithms Learn?
Okay, now we’re starting to get a little above my pay-grade… I’d defer this question to my buddy, Bill Slawski, for a meatier answer. If you talk to Bill, I imagine he’d get into the differences between statistical and algorithmic learning theories. I’ve read a lot on the topic, but I’m still a long way from understanding all the subtleties. To me, there seem to be a lot of similarities.
But I’ll take a stab at what I think I understand, just ’cause that’s how I roll (so take it with a grain of salt, folks).
Many factors can come into play, by which an algorithm can determine that either an if or a then (or an else, for that matter) needs to be adjusted. Data from other algorithms may be applied, specific findings may be found to be true or false in the majority of instances, raw data may be introduced… I don’t honestly know if we might be talking about a dozen or so types of data or hundreds. My bet would be less than a dozen data types – I’d love to know if I’m way off base there.
But essentially, using the data at hand, a learning algorithm will determine the probability of a previous finding being accurate or not, and adjust its own presets to some degree, to attempt to correct or dial in its future findings. As complex as the aggregated process may be, I think the basic strategy is probably really that simple.
Type of Algorithm
At this point, I took a break and called Bill Slawski and chatted with him about it. His first comment was “I’m not a search engineer”, which was perfectly valid. I knew that, but I also know that Bill knows a hell of a lot more about how search works than most people. So I pressed on.
I told him I was a little confused about the supposed differences between probabilistic and statistical algorithms, and he told me that he saw a lot of similarities, too. Trust me when I say that anytime Bill agrees with you, you feel somewhat uplifted. At the very least, a little less like a babe in the woods.
The conclusion I came to is that from our standpoint, as users and observers, there’s a bit of overlap between the two, inasmuch as stats enter into probabilistic projections, to a degree. The amount of cross-referencing and comparison by Google that must take place in every passing second boggles the imagination.
But that’s just the starting pistol. Determining how often, to what degree and why a particular result should be considered a match for a given query is something Google has been doing for a long time – and doing pretty well, all things considered. Incorporating other signals into that matching process is nothing new for them, either. CTR and bounce-rate, for instance, have long been signals of user satisfaction with a given search result.
Of course, in the early days, engineers had to tell the algorithms what signals and thresholds to utilize in the matchmaking process, and until those settings were updated, the algorithms continued to follow the same guidelines. Now, however, these “learning algorithms” can adjust those signals and thresholds independently, according to whatever criteria the engineers established as variables.
Hundreds of man-hours could go into even relatively simple updates before… probably thousands, for the more complex changes. But the speed with which the algorithms can process inputs and fine-tune themselves is exponentially faster – so it’s much more efficient. And I think one could reasonable argue that it’s also becoming increasingly accurate, at least in most instances.
Image credit: https://www.lucidchart.com/pages/templates/flowchart/algorithm-flowchart-template