Yandex pushes the bar with new intelligent search algorithm Korolyov

The Russian search engine, Yandex has set the bar high with what we at Adrac think is the first acknowledged full scale integration of crowd sourced data into a search engine algorithm.

The integration is part of the new version of the Yandex search platform. The official announcement comprised of two significant developments for search;

  • Korolov – an upgraded version of a deep neural network-based search algorithm, something that is not entirely new in the world of search.
  • Toloka – a mass-scale crowd-sourced platform for search assessors, something that is really exciting.

Korolov – An artificial intelligence (AI) system that builds on work Yandex started with its previous neural network search algorithm, Palekh. The addition of AI into the fabric of search is specifically to improve how Yandex deals with long-tail search queries – it will give the search engine a better understanding of the web’s ecosystem by analysing the complete web page rather than the keywords alone (as Palekh did). It provides scale benefits too being able to deal with thousands more documents in real time than previous algorithms.

The use of AI means Korolov will also improve itself with each data point it scans using self-learning logic built into the system. This information is then fed into Yandex’s machine learning ranking algorithm, which factors in additional ranking signals before returning the results to the user all done in almost real time with the intention of giving the search user the best possible experience.

The machine learning algorithm, known as MatrixNet, incorporates data from Yandex.Toloka, the mass-scale, crowd sourced platform for search assessors to train the machine learning algorithm. In essence this means that a team of human assessors also analyse web page content at scale, then feed their results back into the system so MatrixNet can improve its own understanding.

Yandex is unique in coupling AI with large scale, human crowd-sourced data and that’s what makes this development in search so exciting.

In the same vain that performance car manufacturers took hybrid engine technology and applied the lack of lag in electric powered engines to plug a performance gap typically left with petrol or turbo engines Yandex states the big strength of this machine – crowd sourced pairing is it gives Yandex a much better understanding of user intent. The result: the returned search results should be highly relevant.

What we find most interesting here at Adrac is not just the fact that this is the first acknowledged full scale integration of crowd sourced data into a search engine algorithm. It’s also the unique combination of the two – with AI and crowd-sourced analysis working together, it should be impossible to manipulate the search engine results for any prolonged period of time a challenge that the team here at Adrac relish.

Whilst Google, the current search behemoth, has its own human editors who quietly work away in the background. They make manual changes and generally review “flagged” possible violations generated by automated rules. The problem with this type of human intervention is that it’s almost a blinker system which is fed by machines with humans on hand to either confirm or deny the machine got it right. The rules are also very grey and steeped in secrecy (the only significant data on this being an “unofficially” leaked internal manual for the team).

Yandex takes out the hedging, integrating the human aspect into the very fabric of the algorithm at the same level as the AI so that the human brain (the superior processor) is on a par with the computer which is incredibly bold and exciting.

The Yandex approach has always been innovation “outside the box” and it’s ground breaking use of a wholescale hive or collective set of human brains in place of super computers and next gen computer chips is just another in a long line of brave and innovative approached adopted by the company.

The approach is more agile and scalable, suggesting there’s a lot of scope for this to be expanded and explored even further as Yandex, and the other search engines, begin to understand the partnership, future applications and begin realising the colossal potential.

It is not all plain sailing and there are areas for concern, such as who adjudicates the results, what impact does individualism have? What safeguards and processes are in place to provide the right training, guidelines and adjudication of reviews? If that framework is solved, what Yandex can achieve with this approach is absolutely mind-blowing.

There will also be internal concerns for Yandex namely protecting the human side of their algorithm. In order to get the best results, the training has to be solid. For good training, you have to share and if that happens, there’s the prospect of imitation and possible outside influence or interference which could negatively impact the search results and the company as a whole.

The team and I will be monitoring the progress of this exciting innovation and hope to be able to report back on implementation, effectiveness and future applications.