Sunday, August 30, 2009

Quick Tips To Search Google Like An Expert

If you’re like me, you probably use Google many times a day. But, chances are, unless you are a technology geek, you probably still use Google in its simplest form. If your current use of Google is limited to typing a few words in, and changing your query until you find what you’re looking for, then I’m here to tell you that there’s a better way – and it’s not hard to learn. On the other hand, if you are a technology geek, and can use Google like the best of them already, then I suggest you bookmark this article of Google search tips. You’ll then have the tips on hand when you are ready to pull your hair out in frustration when watching a neophyte repeatedly type in basic queries in a desperate attempt to find something.

Expert Google Search Tips

1. Explicit Phrase:
Lets say you are looking for content about internet marketing. Instead of just typing internet marketing into the Google search box, you will likely be better off searching explicitly for the phrase. To do this, simply enclose the search phrase within double quotes.

Example: "internet marketing"

2. Exclude Words:
Lets say you want to search for content about internet marketing, but you want to exclude any results that contain the term advertising. To do this, simply use the "-" sign in front of the word you want to exclude.

Example Search: internet marketing -advertising

3. Site Specific Search:
Often, you want to search a specific website for content that matches a certain phrase. Even if the site doesn’t support a built-in search feature, you can use Google to search the site for your term. Simply use the "site:somesite.com" modifier.

Example: "internet marketing" site:www.smallbusinesshub.com

4. Similar Words and Synonyms:
Let’s say you are want to include a word in your search, but want to include results that contain similar words or synonyms. To do this, use the "~" in front of the word.

Example: "internet marketing" ~professional

5. Specific Document Types:
If you’re looking to find results that are of a specific type, you can use the modifier "filetype:". For example, you might want to find only PowerPoint presentations related to internet marketing.

Example: "internet marketing" filetype:ppt

6. This OR That:
By default, when you do a search, Google will include all the terms specified in the search. If you are looking for any one of one or more terms to match, then you can use the OR operator. (Note: The OR has to be capitalized).

Example: internet marketing OR advertising

7. Phone Listing:
Let’s say someone calls you on your mobile number and you don’t know how it is. If all you have is a phone number, you can look it up on Google using the phonebook feature.

Example: phonebook:617-555-1212 (note: the provided number does not work – you’ll have to use a real number to get any results).

8. Area Code Lookup:
If all you need to do is to look-up the area code for a phone number, just enter the 3-digit area code and Google will tell you where it’s from.

Example: 617

9. Numeric Ranges:
This is a rarely used, but highly useful tip. Let’s say you want to find results that contain any of a range of numbers. You can do this by using the X..Y modifier (in case this is hard to read, what’s between the X and Y are two periods. This type of search is useful for years (as shown below), prices or anywhere where you want to provide a series of numbers.

Example: president 1940..1950

10. Stock (Ticker Symbol):
Just enter a valid ticker symbol as your search term and Google will give you the current financials and a quick thumb-nail chart for the stock.

Example: GOOG

11. Calculator:
The next time you need to do a quick calculation, instead of bringing up the Calculator applet, you can just type your expression in to Google.

Example: 48512 * 1.02

12. Word Definitions:
If you need to quickly look up the definition of a word or phrase, simply use the "define:" command.

Example: define:plethora

Hope this list of Google search tips proves useful in your future Google searches. If there are any of your favorite Google expert power tips that I’ve missed, please feel free to share them in the comments.

Google search basics tricks

Search is simple: just type whatever comes to mind in the search box, hit Enter or click on the Google Search button, and Google will search the web for pages that are relevant to your query.

Most of the time you'll find exactly what you were looking for with just a basic query. However the following tips can help you refine your technique to make the most of your searches. Throughout the article, we'll use square brackets [ ] to signal queries, so [ black and white ] is one query, while [ black ] and [ white ] are two.
Some basic facts

* Every word matters. Generally, all the words you put in the query will be used. There are some exceptions.
* Search is always case insensitive. Searching for [ new york times ] is the same as searching for [ New York Times ].
* With some exceptions, punctuation is ignored (that is, you can't search for @#$%^&*()=+[]\ and other special characters).

Guidelines for better search


* Keep it simple. If you're looking for a particular company, just enter its name, or as much of its name as you can recall. If you're looking for a particular concept, place, or product, start with its name. If you're looking for a pizza restaurant, just enter pizza and the name of your town or your zip code. Most queries do not require advanced operators or unusual syntax. Simple is good.
* Think how the page you are looking for will be written. A search engine is not a human, it is a program that matches the words you give to pages on the web. Use the words that are most likely to appear on the page. For example, instead of saying [ my head hurts ], say [ headache ], because that's the term a medical page will use. The query [ in what country are bats considered an omen of good luck? ] is very clear to a person, but the document that gives the answer may not have those words. Instead, use the query [ bats are considered good luck in ] or even just [ bats good luck ], because that is probably what the right page will say.
* Describe what you need with as few terms as possible. The goal of each word in a query is to focus it further. Since all words are used, each additional word limits the results. If you limit too much, you will miss a lot of useful information. The main advantage to starting with fewer keywords is that, if you don't get what you need, the results will likely give you a good indication of what additional words are needed to refine your results on the next search. For example, [ weather cancun ] is a simple way to find the weather and it is likely to give better results than the longer [ weather report for cancun mexico ].
* Choose descriptive words. The more unique the word is the more likely you are to get relevant results. Words that are not very descriptive, like 'document,' 'website,' 'company,' or 'info,' are usually not needed. Keep in mind, however, that even if the word has the correct meaning but it is not the one most people use, it may not match the pages you need. For example, [ celebrity ringtones ] is more descriptive and specific than [ celebrity sounds ].

How to read search results

Google's goal is to provide you with results that are clear and easy to read. The diagram below points out four features that are important to understanding the search results page:

search results

1. The title: The first line of any search result is the title of the webpage.
2. The snippet: A description of or an excerpt from the webpage.
3. The URL: The webpage's address.
4. Cached link: A link to an earlier version of this page. Click here if the page you wanted isn't available.

All these features are important in determining whether the page is what you need. The title is what the author of the page designated as the best short description of the page.

The snippet is Google's algorithmic attempt to extract just the part of the page most relevant to your query. The URL tells you about the site in general.

GOOGLE PAGE RANKING ALGORITHM

The PageRank Algorithm

The original PageRank algorithm was described by Lawrence Page and Sergey Brin in several publications. It is given by

PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))

where
PR(A) is the PageRank of page A,
PR(Ti) is the PageRank of pages Ti which link to page A,
C(Ti) is the number of outbound links on page Ti and
d is a damping factor which can be set between 0 and 1.

So, first of all, we see that PageRank does not rank web sites as a whole, but is determined for each page individually. Further, the PageRank of page A is recursively defined by the PageRanks of those pages which link to page A.

The PageRank of pages Ti which link to page A does not influence the PageRank of page A uniformly. Within the PageRank algorithm, the PageRank of a page T is always weighted by the number of outbound links C(T) on page T. This means that the more outbound links a page T has, the less will page A benefit from a link to it on page T.

The weighted PageRank of pages Ti is then added up. The outcome of this is that an additional inbound link for page A will always increase page A's PageRank.

Finally, the sum of the weighted PageRanks of all pages Ti is multiplied with a damping factor d which can be set between 0 and 1. Thereby, the extend of PageRank benefit for a page by another page linking to it is reduced.

The Random Surfer Model

In their publications, Lawrence Page and Sergey Brin give a very simple intuitive justification for the PageRank algorithm. They consider PageRank as a model of user behaviour, where a surfer clicks on links at random with no regard towards content.

The random surfer visits a web page with a certain probability which derives from the page's PageRank. The probability that the random surfer clicks on one link is solely given by the number of links on that page. This is why one page's PageRank is not completely passed on to a page it links to, but is devided by the number of links on the page.

So, the probability for the random surfer reaching one page is the sum of probabilities for the random surfer following links to this page. Now, this probability is reduced by the damping factor d. The justification within the Random Surfer Model, therefore, is that the surfer does not click on an infinite number of links, but gets bored sometimes and jumps to another page at random.

The probability for the random surfer not stopping to click on links is given by the damping factor d, which is, depending on the degree of probability therefore, set between 0 and 1. The higher d is, the more likely will the random surfer keep clicking links. Since the surfer jumps to another page at random after he stopped clicking links, the probability therefore is implemented as a constant (1-d) into the algorithm. Regardless of inbound links, the probability for the random surfer jumping to a page is always (1-d), so a page has always a minimum PageRank.

A Different Notation of the PageRank Algorithm

Lawrence Page and Sergey Brin have published two different versions of their PageRank algorithm in different papers. In the second version of the algorithm, the PageRank of page A is given as

PR(A) = (1-d) / N + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))

where N is the total number of all pages on the web. The second version of the algorithm, indeed, does not differ fundamentally from the first one. Regarding the Random Surfer Model, the second version's PageRank of a page is the actual probability for a surfer reaching that page after clicking on many links. The PageRanks then form a probability distribution over web pages, so the sum of all pages' PageRanks will be one.

Contrary, in the first version of the algorithm the probability for the random surfer reaching a page is weighted by the total number of web pages. So, in this version PageRank is an expected value for the random surfer visiting a page, when he restarts this procedure as often as the web has pages. If the web had 100 pages and a page had a PageRank value of 2, the random surfer would reach that page in an average twice if he restarts 100 times.

As mentioned above, the two versions of the algorithm do not differ fundamentally from each other. A PageRank which has been calculated by using the second version of the algorithm has to be multiplied by the total number of web pages to get the according PageRank that would have been caculated by using the first version. Even Page and Brin mixed up the two algorithm versions in their most popular paper "The Anatomy of a Large-Scale Hypertextual Web Search Engine", where they claim the first version of the algorithm to form a probability distribution over web pages with the sum of all pages' PageRanks being one.

In the following, we will use the first version of the algorithm. The reason is that PageRank calculations by means of this algorithm are easier to compute, because we can disregard the total number of web pages.

The Characteristics of PageRank

The characteristics of PageRank shall be illustrated by a small example.

We regard a small web consisting of three pages A, B and C, whereby page A links to the pages B and C, page B links to page C and page C links to page A. According to Page and Brin, the damping factor d is usually set to 0.85, but to keep the calculation simple we set it to 0.5. The exact value of the damping factor d admittedly has effects on PageRank, but it does not influence the fundamental principles of PageRank. So, we get the following equations for the PageRank calculation:


PR(A) = 0.5 + 0.5 PR(C)
PR(B) = 0.5 + 0.5 (PR(A) / 2)
PR(C) = 0.5 + 0.5 (PR(A) / 2 + PR(B))

These equations can easily be solved. We get the following PageRank values for the single pages:

PR(A) = 14/13 = 1.07692308
PR(B) = 10/13 = 0.76923077
PR(C) = 15/13 = 1.15384615

It is obvious that the sum of all pages' PageRanks is 3 and thus equals the total number of web pages. As shown above this is not a specific result for our simple example.

For our simple three-page example it is easy to solve the according equation system to determine PageRank values. In practice, the web consists of billions of documents and it is not possible to find a solution by inspection.

The Iterative Computation of PageRank

Because of the size of the actual web, the Google search engine uses an approximative, iterative computation of PageRank values. This means that each page is assigned an initial starting value and the PageRanks of all pages are then calculated in several computation circles based on the equations determined by the PageRank algorithm. The iterative calculation shall again be illustrated by our three-page example, whereby each page is assigned a starting PageRank value of 1.
Iteration PR(A) PR(B) PR(C)
0 1 1 1
1 1 0.75 1.125
2 1.0625 0.765625 1.1484375
3 1.07421875 0.76855469 1.15283203
4 1.07641602 0.76910400 1.15365601
5 1.07682800 0.76920700 1.15381050
6 1.07690525 0.76922631 1.15383947
7 1.07691973 0.76922993 1.15384490
8 1.07692245 0.76923061 1.15384592
9 1.07692296 0.76923074 1.15384611
10 1.07692305 0.76923076 1.15384615
11 1.07692307 0.76923077 1.15384615
12 1.07692308 0.76923077 1.15384615

We see that we get a good approximation of the real PageRank values after only a few iterations. According to publications of Lawrence Page and Sergey Brin, about 100 iterations are necessary to get a good approximation of the PageRank values of the whole web.

Also, by means of the iterative calculation, the sum of all pages' PageRanks still converges to the total number of web pages. So the average PageRank of a web page is 1. The minimum PageRank of a page is given by (1-d). Therefore, there is a maximum PageRank for a page which is given by dN+(1-d), where N is total number of web pages. This maximum can theoretically occur, if all web pages solely link to one page, and this page also solely links to itself.

The Implementation of PageRank

Saturday, August 29, 2009

Google Caffeine: A Detailed Test of the New Google

Did you hear? Google’s launching a new, upgraded version of its search engine soon. And just as important, the search giant released the developer’s preview of it today. Google (Google) promises that the new search tool (codename “Caffeine”) will improve the speed, accuracy, size, and comprehensiveness of Google search.

While the developer version is a pre-beta release, it’s completely usable. Thus, we’ve decided to put the new Google search through the wringer. We took the developer version for a spin and compared it to not only the current version of Google Search (Google search), but to Bing (Bing) as well.

The categories we tested the new search engine on are as follows: speed, accuracy, temporal relevancy, and index size. Here’s how we define those:

Speed: How fast can the new search engine load results?

Accuracy: Which set of results is more accurate to the search term?

Temporal Relevancy: Is one version of search better at capturing breaking news?

Index Size: Is it really more comprehensive than the last version of Google?
1.SPEED------
The first category is incredibly important. How fast do these Google search results come at you anyway? Even a tenth of a second can mean millions for the search company as the longer it takes the load, the more likely someone will go look for results somewhere else.

So how fast is the new search? Lightning fast. As you probably know, Google tells you how long it takes to load results. We tried a few search terms, starting with “Dog.” Here’s the speed result:


Compare that to the original Google search:


0.12 vs. 0.25 seconds? They doubled the speed! That’s tremendous.

Friday, August 28, 2009

Google to Launch a New Version of Google Search

GoogleGoogle has a giant target on its back.
Microsoft has been on a spending and deal-making spree to grow BingBing, recently signing a huge search deal with Yahoo. And with Bing starting to steal some market share from Google, it’s proving to be a formidable opponent. Oh, and now you can’t count out Facebookfacebook either, which just launched a new realtime search engine.

Google’s not taking any of this lying down. Secretly, they’ve been working on a new project: the next generation of Google SearchGoogle search. This isn’t just some minor upgrade, but an entire new infrastructure for the world’s largest search engine. In other words: it’s a new version of Google.

The project’s still under construction, but Google’s now confident enough in the new version of its search engine that it has released the development version for public consumption. While you won’t see too many differences immediately, let us assure you: it’s a completely upgraded Google search.

Google specifically states that its goal for the new version of Google Search is to improve its indexing speed, accuracy, size, and comprehensiveness. Here’s what they wrote:

“For the last several months, a large team of Googlers has been working on a secret project: a next-generation architecture for Google’s web search. It’s the first step in a process that will let us push the envelope on size, indexing speed, accuracy, comprehensiveness and other dimensions. The new infrastructure sits “under the hood” of Google’s search engine, which means that most users won’t notice a difference in search results.

Sunday, August 2, 2009

GOOGLE>>>>YAHOO>>>ASK>>MSN

Google

* has been in the search game a long time, and saw the web graph when it is much cleaner than the current web graph
* is much better than the other engines at determining if a link is a true editorial citation or an artificial link
* looks for natural link growth over time
* heavily biases search results toward informational resources
* trusts old sites way too much
* a page on a site or subdomain of a site with significant age or link related trust can rank much better than it should, even with no external citations
* they have aggressive duplicate content filters that filter out many pages with similar content
* if a page is obviously focused on a term they may filter the document out for that term. on page variation and link anchor text variation are important. a page with a single reference or a few references of a modifier will frequently outrank pages that are heavily focused on a search phrase containing that modifier
* crawl depth determined not only by link quantity, but also link quality. Excessive low quality links may make your site less likely to be crawled deep or even included in the index.
* things like cheesy off topic reciprocal links are generally ineffective in Google when you consider the associated opportunity cost


Ask

* looks at topical communities
* due to their heavy emphasis on topical communities they are slow to rank sites until they are heavily cited from within their topical community
* due to their limited market share they probably are not worth paying much attention to unless you are in a vertical where they have a strong brand that drives significant search traffic


Yahoo!

* been in the search game for many years.
* is better than MSN but nowhere near as good as Google at determining if a link is a natural citation or not.
* has a ton of internal content and a paid inclusion program. both of which give them incentive to bias search results toward commercial results
* things like cheesy off topic reciprocal links still work great in Yahoo!

MSN Search

* new to the search game
* is bad at determining if a link is natural or artificial in nature
* due to sucking at link analysis they place too much weight on the page content
* their poor relevancy algorithms cause a heavy bias toward commercial results
* likes bursty recent links
* new sites that are generally untrusted in other systems can rank quickly in MSN Search
* things like cheesy off topic reciprocal links still work great in MSN Search

The technology behind Google's great results

As a Google user, you're familiar with the speed and accuracy of a Google search. How exactly does Google manage to find the right results for every query as quickly as it does? The heart of Google's search technology is PigeonRank™, a system for ranking web pages developed by Google founders Larry Page and Sergey Brin at Stanford University.

PigeonRank System

Building upon the breakthrough work of B. F. Skinner, Page and Brin reasoned that low cost pigeon clusters (PCs) could be used to compute the relative value of web pages faster than human editors or machine-based algorithms. And while Google has dozens of engineers working to improve every aspect of our service on a daily basis, PigeonRank continues to provide the basis for all of our web search tools.

Why Google's patented PigeonRank™ works so well

PigeonRank's success relies primarily on the superior trainability of the domestic pigeon (Columba livia) and its unique capacity to recognize objects regardless of spatial orientation. The common gray pigeon can easily distinguish among items displaying only the minutest differences, an ability that enables it to select relevant web sites from among thousands of similar pages.

By collecting flocks of pigeons in dense clusters, Google is able to process search queries at speeds superior to traditional search engines, which typically rely on birds of prey, brooding hens or slow-moving waterfowl to do their relevance rankings.

diagramWhen a search query is submitted to Google, it is routed to a data coop where monitors flash result pages at blazing speeds. When a relevant result is observed by one of the pigeons in the cluster, it strikes a rubber-coated steel bar with its beak, which assigns the page a PigeonRank value of one. For each peck, the PigeonRank increases. Those pages receiving the most pecks, are returned at the top of the user's results page with the other results displayed in pecking order.

Integrity

Google's pigeon-driven methods make tampering with our results extremely difficult. While some unscrupulous websites have tried to boost their ranking by including images on their pages of bread crumbs, bird seed and parrots posing seductively in resplendent plumage, Google's PigeonRank technology cannot be deceived by these techniques. A Google search is an easy, honest and objective way to find high-quality websites with information relevant to your search.

Data


PigeonRank Frequently Asked Questions

How was PigeonRank developed?

The ease of training pigeons was documented early in the annals of science and fully explored by noted psychologist B.F. Skinner, who demonstrated that with only minor incentives, pigeons could be trained to execute complex tasks such as playing ping pong, piloting bombs or revising the Abatements, Credits and Refunds section of the national tax code.

Brin and Page were the first to recognize that this adaptability could be harnessed through massively parallel pecking to solve complex problems, such as ordering large datasets or ordering pizza for large groups of engineers. Page and Brin experimented with numerous avian motivators before settling on a combination of linseed and flax (lin/ax) that not only offered superior performance, but could be gathered at no cost from nearby open space preserves. This open space lin/ax powers Google's operations to this day, and a visit to the data coop reveals pigeons happily pecking away at lin/ax kernels and seeds.

What are the challenges of operating so many pigeon clusters (PCs)?

Pigeons naturally operate in dense populations, as anyone holding a pack of peanuts in an urban plaza is aware. This compactability enables Google to pack enormous numbers of processors into small spaces, with rack after rack stacked up in our data coops. While this is optimal from the standpoint of space conservation and pigeon contentment, it does create issues during molting season, when large fans must be brought in to blow feathers out of the data coop. Removal of other pigeon byproducts was a greater challenge, until Page and Brin developed groundbreaking technology for converting poop to pixels, the tiny dots that make up a monitor's display. The clean white background of Google's home page is powered by this renewable process.

Aren't pigeons really stupid? How do they do this?

While no pigeon has actually been confirmed for a seat on the Supreme Court, pigeons are surprisingly adept at making instant judgments when confronted with difficult choices. This makes them suitable for any job requiring accurate and authoritative decision-making under pressure. Among the positions in which pigeons have served capably are replacement air traffic controllers, butterfly ballot counters and pro football referees during the "no-instant replay" years.

Where does Google get its pigeons? Some special breeding lab?

Google uses only low-cost, off-the-street pigeons for its clusters. Gathered from city parks and plazas by Google's pack of more than 50 Phds (Pigeon-harvesting dogs), the pigeons are given a quick orientation on web site relevance and assigned to an appropriate data coop.

Isn't it cruel to keep pigeons penned up in tiny data coops?

Google exceeds all international standards for the ethical treatment of its pigeon personnel. Not only are they given free range of the coop and its window ledges, special break rooms have been set up for their convenience. These rooms are stocked with an assortment of delectable seeds and grains and feature the finest in European statuary for roosting.

What's the future of pigeon computing?

Google continues to explore new applications for PigeonRank and affiliated technologies. One of the most promising projects in development involves harnessing millions of pigeons worldwide to work on complex scientific challenges. For the latest developments on Google's distributed cooing initiative, please consider signing up for our Google Friends newsletter.
 
Blog Advertising - Get Paid to Blog