How Google Works

Posted by Rahul Goel | 10:23 AM | , | 0 comments »


how search engines work, specifically Google.
Why Google? They own the largest part of all search engine advertising on the internet, anywhere from 50% to 85% depending on who you want to believe. At 50% though, I am going to focus on them if I am going to put a lot of time and energy to SEO.
Search engines like Google have a job to do. They want (and need) to optimize our search results. If we are typing in keywords to find information about Martin guitars, we don't really want to see web site after web site in our listings pertaining to Wilson baseball bats. We have an expectation of getting search results closely related to what we are searching for.
Search engines also want to create a level playing field for those competing for top spots within each specific search phrase.
Google is basically performing several tasks by performing queries over the internet for fresh content, indexing all of the important words, comparing results of all of their indexed information, and creating results that justify what you and I are looking for if we do a search.
Here is a basic understanding of what they do.
Within Google, there is a distributed network which contains thousands of your average type computers. These computers are all connected to each other with what is called parallel processing. An analogy of this would be to visualize a chain in how each link connects to each other. These computers work independently as well as simultaneously with each other. The speed is incredible when doing data processing. The process or program that runs on these computers is called Googlebot.
There are multiple steps that occur with Google but I am going to focus on Googlebot. The process Googlebot is always working to obtain fresh content. They do "not" go to each web site like you would think but rather they work like a web browser. They send out queries to web servers requesting pages.
Usually, they do this through the add URL form which for Google is at http://google.com/addurl.html. If you have created a web site, this is where you would submit your domain name. When they do come and "crawl" your site, they will also index all of your links. They place these into another index file and send out queries through them later.
Note: When you submit your site like this, it can take over a month at least before Google gets to crawling it. There is a way around this which we will talk about later!
Getting back to the web servers, note that typically every domain name is connected to a server. These are the places in which you are paying for their hosting services.
Again, if a web server gets queried and sends the fresh content pages to those thousands of computers, Googlebot will then place the links that they find and place them into a queue. At a later time, they will then send queries to the servers "those" links are pointing to and "grab" the content from "those" pages. This process goes on and on forever!
This all occurs very quickly. While this is all going on, Googlebot also ensures that there are no duplicate links or content so that they do not waste time and re-index a web site. During all of this, there is a process where Googlebot places sites that need revisited into another cue (visit the site soon) URL's. You actually determine the date in which they come back by how often your web sites' content changes. The more often it changes, the more often the site will be crawled.
Note: What I just explained is very important for SEO. You need to be consistent with your changes. You do "not" want Google to crawl your site if they are expecting fresh content. If they do, it will be later and later before they come back and visit your site.
This process where they come back to your site based on content changes are called "fresh crawls".
They are also going to be doing from time to time what is called a "hard crawl". This occurs infrequently throughout the course of a year. During hard crawls, they are sweeping the internet for "everything". They then make comparisons and determine your web sites positions in their search rankings for specific word phrases as well as every other web site on the internet.
Note: I want to point out that when they crawl your site, they are only crawling the page or "pages" you have made changes to. They are not indexing your whole site.
Had enough? It is not important to know the above but there are people like me that like to understand in some way the bigger picture. Knowing this can help though when you are doing SEO!

0 comments