Patent Analysis
Below is a straightforward analysis/interpretation of the key parts of patent application #20050071741, a patent filed by Google regarding information retrieval technology (which is essentially what its search engine does).
The analysis that follows is broken down according to the various sections of the patent application itself. To help understand the referencing, check out the official application in its entirety here.
Claim #2: "The method of claim 1, wherein the one or more types of history data includes information relating to an inception date; and wherein the generating a score includes: determining an inception date corresponding to the document, and scoring the document based, at least in part, on the inception date corresponding to the document."
Translation: Google looks at how old a web page is. Older web pages are often accredited more trust, and new web pages will need to prove their trustworthiness over time.
Claim #3: "The method of claim 2, wherein the document includes a plurality of documents; and wherein the scoring the document includes: determining an age of each of the documents based on the inception dates corresponding to the documents, determining an average age of the documents based on the ages of the documents, and scoring the documents based, at least in part, on a difference between the ages of the documents and the average age."
Translation: Try to have your web page associated with web pages that are already trustworthy. If a web page is old and ranks reasonably well for its corresponding search query, getting a link from it can be useful -- especially if the web page discusses the same topic as yours.
Claim #5: "The method of claim 2, wherein the inception date corresponding to the document is based on at least one of a date when a search engine first discovers the document, a date when a search engine first discovers a link to the document, and a date when the document includes at least a predetermined number of pages."
Translation: A web site with X number of pages will rank better than a web site with fewer than X number of pages. What exactly X is equal to is still a mystery, but the take home point should be to make sure your web site has a bit of depth to it. One effective way of doing this is to break up a long document into multiple web pages. This is often better for the user as long web pages require scrolling, something that causes many people to skim rather than read.
Claim #7: "The method of claim 6, wherein the frequency at which the content of the document changes is based on at least one of an average time between the changes, a number of changes in a time period, and a comparison of a rate of change in a current time period with a rate of change in a previous time period."
Translation: Web sites with fresh content -- meaning content that is updated on a regular basis -- are held in higher regard by Google. This point is touched upon a number of times throughout the patent application.
Claim #10: "The method of claim 8, wherein the determining an amount by which the content of the document changes includes: weighting different portions of the content of the document differently based on a perceived importance of the portions, and determining the amount by which the content of the document changes as a function of the differently weighted portions of the content."
Translation: Different portions of a web page are treated differently. For example, a link to your site from the body of a web page is most likely worth more than a link from the footer of the same page.
Claim #11: "The method of claim 6, wherein the document includes a plurality of documents; and wherein the scoring the document includes: determining a date on which the content of each of the documents last changed, determining an average date of change based on the determined dates on which the contents of the documents last changed, and scoring the documents based, at least in part, on a difference between the dates on which the contents of the documents last changed and the average date of change."
Translation: A web page will rank better if it is linked to from other web pages that have fresh content. A comprehensive news site, for instance, will probably have multiple pages that have content updated frequently -- and hence it is more likely to have all of its interlinked, frequently updated pages rank well.
Claim #13: "The method of claim 12, wherein the amount by which the content of the document changes is based on at least one of a number of new pages associated with the document within a time period, a ratio of a number of new pages associated with the document versus a total number of pages associated with the document, and a percentage of the content of the document that has changed during a time period."
Translation: You'll need to have a substantial portion of the page updated for it to be considered fresh. Changing a few words won't cut the mustard.
Claim #14: "The method of claim 12, wherein the determining an amount by which the content of the document changes includes: weighting different portions of the content of the document differently based on a perceived importance of the portions, and determining the amount by which the content of the document changes as a function of the differently weighted portions of the content."
Translation: This ties into Claim #10; various parts of the web page are accorded different significance. As a result, adding fresh content to the footer of your web page is not as significant as adding fresh content to the body of your web page.
Claim #16: "The method of claim 15, wherein the scoring the document includes assigning a higher score to the document when the document is selected more often than other documents in the set of search results over a time period."
Translation: Google is tracking the click through rate from its search results page. For example, let's say when someone types in "scary books" into a search engine, your result comes up on the bottom of the first page. If people are repeatedly clicking on your link from the search results page, you will eventually move up the ranks.
Claim #19: "The method of claim 1, wherein the one or more types of history data includes information relating to staleness of documents; and wherein the generating a score includes: determining whether the document is stale, and scoring the document based, at least in part, on whether the document is stale."
Translation: Another emphasis on fresh content. If your page has content that has not been updated in quite some time, your page is considered old, and lose ranking accordingly.
Claim #21: "The method of claim 20, wherein the determining whether stale documents are considered favorable for the search query is based, at least in part, on how often stale documents were selected over recent documents over time for the search query."
Translation: If your content is old but is repeatedly clicked upon from the search results page, your page will be considered trustworthy, and will not lose ranking on the grounds of being old.
Claims #25: "The method of claim 22, wherein the determining behavior of links associated with the document includes monitoring at least one of time-varying behavior of links associated with the document, how many links associated with the document appear or disappear during a time period, and whether there is a trend toward appearance of new links associated with the document versus disappearance of existing links associated with the document."
Translation: Are you getting more links to your web pages or are you losing links to your web pages? What is the trend? If you are gaining links, you'll be accorded more trust, and will start to rank higher.
Claim #28: "The method of claim 26, wherein the weight assigned to a link is based on at least one of how much a document containing the link is trusted, how authoritative a document containing the link is, and a freshness of a document containing the link."
Translation: Not all links are created equal. Links from pages that Google already trusts -- meaning links from pages that are already ranking well -- will help your site to rank better.
Claim #31: "The method of claim 1, wherein the one or more types of history data includes information relating to differences in documents and anchor text associated with links to the documents; and wherein the generating a score includes: determining whether a content of the document changes such that the content differs from anchor text associated with one or more links to the document, and scoring the document based, at least in part, on whether the content of the document changes such that the content differs from the anchor text associated with one or more links to the document."
Translation: Google is keeping track of the text being used to link to your web page. If the text that is used to link to your web page -- referred to as the "anchor text" -- contains your keywords, that is beneficial for your web page. Don't overdo it though; that can look suspicious, and you can get penalized for that. As always, balance is the key.
Claim #34: "The method of claim 1, wherein the one or more types of history data includes information relating to traffic associated with documents; and wherein the generating a score includes: determining characteristics of traffic associated with the document, and scoring the document based, at least in part, on the characteristics of traffic associated with the document."
Translation: Google uses the traffic data it has associated with the page to determine if the page was relevant to the user's search query. As noted earlier in this series, Google has tools (Google toolbar, Google desktop search, ownership of web analytics firm Urchin to name a few) that enable it to have substantial information regarding the traffic associated with various web pages.
Claim #46: "The method of claim 45, wherein the user maintained or generated data relates to at least one of favorites lists, bookmarks, temp files, and cache files associated with one or a plurality of users."
Translation: Google monitors to see which web pages have been bookmarked. Presumably the underlying belief is that a web page that is more frequently bookmarked is more trustworthy and of more value to the end user.
Claim #59: "The method of claim 58, wherein the adjusting the ranking includes penalizing the ranking if the longevity indicates a short life for the linkage data and boosting the ranking if the longevity indicates a long life for the linkage data."
Translation: If you have many short-term links -- meaning links to your site that only last for a short period of time -- Google can penalize your site and deem it as untrustworthy. The possibility of this occurring opens a whole can of worms, which we'll discuss in the next article.
Section 0074: "Links may be weighted in other ways. For example, links may be weighted based on how much the documents containing the links are trusted (e.g., government documents can be given high trust)."
Translation: Links from official government web pages (for example, pages with .gov domain extension) are deemed more trustworthy. Presumably the same may apply of links from web pages of official educational institutions (i.e. those with .edu domain extension).
Section 0077: "The dates that links appear can also be used to detect "spam," where owners of documents or their colleagues create links to their own document for the purpose of boosting the score assigned by a search engine. A typical, "legitimate" document attracts back links slowly. A large spike in the quantity of back links may signal a topical phenomenon (e.g., the CDC web site may develop many links quickly after an outbreak, such as SARS), or signal attempts to spam a search engine (to obtain a higher ranking and, thus, better placement in search results) by exchanging links, purchasing links, or gaining links from documents without editorial discretion on making links. Examples of documents that give links without editorial discretion include guest books, referrer logs, and "free for all" pages that let anyone add a link to a document."
Translation: Another piece of the document that has potentially enormous implications. Essentially, the implication is that pages that acquire links too quickly can be deemed untrustworthy, as they be attempting to manipulate rankings.
Section 0090: "Additionally, or alternatively, search engine 125 may monitor time-varying characteristics relating to "advertising traffic" for a particular document. For example, search engine 125 may monitor one or a combination of the following factors: (1) the extent to and rate at which advertisements are presented or updated by a given document over time; (2) the quality of the advertisers (e.g., a document whose advertisements refer/link to documents known to search engine 125 over time to have relatively high traffic and trust, such as amazon.com, may be given relatively more weight than those documents whose advertisements refer to low traffic/untrustworthy documents, such as a pornographic site); and (3) the extent to which the advertisements generate user traffic to the documents to which they relate (e.g., their click-through rate). Search engine 125 may use these time-varying characteristics relating to advertising traffic to score the document."
Translation: Google is looking at who is advertising on your site; meaning who you are linking to. Interestingly enough linking to credible advertisers may boost your rankings. Does this mean running AdSense -- Google's own contextual ad program -- will boost your rankings?
Section 0094: "If a document is returned for a certain query and over time, or within a given time window, users spend either more or less time on average on the document given the same or similar query, then this may be used as an indication that the document is fresh or stale, respectively. For example, assume that the query "Riverview swimming schedule" returns a document with the title "Riverview Swimming Schedule." Assume further that users used to spend 30 seconds accessing it, but now every user that selects the document only spends a few seconds accessing it. Search engine 125 may use this information to determine that the document is stale (i.e., contains an outdated swimming schedule) and score the document accordingly."
Translation: Once again, Google is suggesting that it can track user behavior, and will incorporate this information into its ranking algorithm. If users are spending less time on a web page than they previously were, this may suggest the web page is old and no longer relevant -- and hence warrants a decrease in ranking.
Section 0097: "According to an implementation consistent with the principles of the invention, information relating to a domain associated with a document may be used to generate (or alter) a score associated with the document. For example, search engine 125 may monitor information relating to how a document is hosted within a computer network (e.g., the Internet, an intranet or other network or database of documents) and use this information to score the document."
Translation: The kind of hosting you have can be factored in as well. Are you on virtual hosting, where you are sharing an IP address with other sites? Perhaps this hurts your trustworthiness, as more credible and sincere sites are likely to have their own IP.
Section 0099: "Certain signals may be used to distinguish between illegitimate and legitimate domains. For example, domains can be renewed up to a period of 10 years. Valuable (legitimate) domains are often paid for several years in advance, while doorway (illegitimate) domains rarely are used for more than a year. Therefore, the date when a domain expires in the future can be used as a factor in predicting the legitimacy of a domain and, thus, the documents associated therewith."
Translation: Buy your domain names for a longer period of time; this shows that you are committed to your web site, and hence more trustworthy.
Section 0101: "Also, or alternatively, the age, or other information, regarding a name server associated with a domain may be used to predict the legitimacy of the domain. A "good" name server may have a mix of different domains from different registrars and have a history of hosting those domains, while a "bad" name server might host mainly pornography or doorway domains, domains with commercial words (a common indicator of spam), or primarily bulk domains from a single registrar, or might be brand new. The newness of a name server might not automatically be a negative factor in determining the legitimacy of the associated domain, but in combination with other factors, such as ones described herein, it could be."
Translation: Pick a web host with a good reputation. If your web host is also hosting sites that are not trustworthy, you could be penalized by association.
Section 0106: "A query set (e.g., of commercial queries) can be repeated, and documents that gained more than M % in the rankings may be flagged or the percentage growth in ranking may be used as a signal in determining scores for the documents. For example, search engine 125 may determine that a query is likely commercial if the average (median) score of the top results is relatively high and there is a significant amount of change in the top results from month to month. Search engine 125 may also monitor churn as an indication of a commercial query. For commercial queries, the likelihood of spam is higher, so search engine 125 may treat documents associated therewith accordingly."
Translation: Not all search queries are treated equally. If you are trying to optimize your site for a term like "search engine optimization," for example, you may find that Google issues penalties more swiftly than if you are trying to optimize for "charities for homeless." Commercial terms are under greater scrutiny.
Section 0116: "In an alternative implementation, other types of user data that may indicate an increase or decrease in user interest in a particular document over time may be used by search engine 125 to score the document. For example, the "temp" or cache files associated with users could be monitored by search engine 125 to identify whether there is an increase or decrease in a document being added over time. Similarly, cookies associated with a particular document might be monitored by search engine 125 to determine whether there is an upward or downward trend in interest in the document."
Translation: Leave your mark on your visitor's computer, as Google will be accessing that information as well. One simple way this can be done is by using cookies, and adding links that allow your users to easily bookmark your site.
Section 0124: "A sudden growth in the number of apparently independent peers, incoming and/or outgoing, with a large number of links to individual documents may indicate a potentially synthetic web graph, which is an indicator of an attempt to spam. This indication may be strengthened if the growth corresponds to anchor text that is unusually coherent or discordant. This information can be used to demote the impact of such links, when used with a link-based scoring technique, either as a binary decision item (e.g., demote the score by a fixed amount) or a multiplicative factor."
Translation: More evidence that Google is cracking down on spam. If you make manipulation of anchor text evident, you will get penalized.
This is a lot of important information. In the next article, we'll wrap things up and see what others are saying around the web regarding this patent.
The analysis that follows is broken down according to the various sections of the patent application itself. To help understand the referencing, check out the official application in its entirety here.
Claim #2: "The method of claim 1, wherein the one or more types of history data includes information relating to an inception date; and wherein the generating a score includes: determining an inception date corresponding to the document, and scoring the document based, at least in part, on the inception date corresponding to the document."
Translation: Google looks at how old a web page is. Older web pages are often accredited more trust, and new web pages will need to prove their trustworthiness over time.
Claim #3: "The method of claim 2, wherein the document includes a plurality of documents; and wherein the scoring the document includes: determining an age of each of the documents based on the inception dates corresponding to the documents, determining an average age of the documents based on the ages of the documents, and scoring the documents based, at least in part, on a difference between the ages of the documents and the average age."
Translation: Try to have your web page associated with web pages that are already trustworthy. If a web page is old and ranks reasonably well for its corresponding search query, getting a link from it can be useful -- especially if the web page discusses the same topic as yours.
Claim #5: "The method of claim 2, wherein the inception date corresponding to the document is based on at least one of a date when a search engine first discovers the document, a date when a search engine first discovers a link to the document, and a date when the document includes at least a predetermined number of pages."
Translation: A web site with X number of pages will rank better than a web site with fewer than X number of pages. What exactly X is equal to is still a mystery, but the take home point should be to make sure your web site has a bit of depth to it. One effective way of doing this is to break up a long document into multiple web pages. This is often better for the user as long web pages require scrolling, something that causes many people to skim rather than read.
Claim #7: "The method of claim 6, wherein the frequency at which the content of the document changes is based on at least one of an average time between the changes, a number of changes in a time period, and a comparison of a rate of change in a current time period with a rate of change in a previous time period."
Translation: Web sites with fresh content -- meaning content that is updated on a regular basis -- are held in higher regard by Google. This point is touched upon a number of times throughout the patent application.
Claim #10: "The method of claim 8, wherein the determining an amount by which the content of the document changes includes: weighting different portions of the content of the document differently based on a perceived importance of the portions, and determining the amount by which the content of the document changes as a function of the differently weighted portions of the content."
Translation: Different portions of a web page are treated differently. For example, a link to your site from the body of a web page is most likely worth more than a link from the footer of the same page.
Claim #11: "The method of claim 6, wherein the document includes a plurality of documents; and wherein the scoring the document includes: determining a date on which the content of each of the documents last changed, determining an average date of change based on the determined dates on which the contents of the documents last changed, and scoring the documents based, at least in part, on a difference between the dates on which the contents of the documents last changed and the average date of change."
Translation: A web page will rank better if it is linked to from other web pages that have fresh content. A comprehensive news site, for instance, will probably have multiple pages that have content updated frequently -- and hence it is more likely to have all of its interlinked, frequently updated pages rank well.
Claim #13: "The method of claim 12, wherein the amount by which the content of the document changes is based on at least one of a number of new pages associated with the document within a time period, a ratio of a number of new pages associated with the document versus a total number of pages associated with the document, and a percentage of the content of the document that has changed during a time period."
Translation: You'll need to have a substantial portion of the page updated for it to be considered fresh. Changing a few words won't cut the mustard.
Claim #14: "The method of claim 12, wherein the determining an amount by which the content of the document changes includes: weighting different portions of the content of the document differently based on a perceived importance of the portions, and determining the amount by which the content of the document changes as a function of the differently weighted portions of the content."
Translation: This ties into Claim #10; various parts of the web page are accorded different significance. As a result, adding fresh content to the footer of your web page is not as significant as adding fresh content to the body of your web page.
Claim #16: "The method of claim 15, wherein the scoring the document includes assigning a higher score to the document when the document is selected more often than other documents in the set of search results over a time period."
Translation: Google is tracking the click through rate from its search results page. For example, let's say when someone types in "scary books" into a search engine, your result comes up on the bottom of the first page. If people are repeatedly clicking on your link from the search results page, you will eventually move up the ranks.
Claim #19: "The method of claim 1, wherein the one or more types of history data includes information relating to staleness of documents; and wherein the generating a score includes: determining whether the document is stale, and scoring the document based, at least in part, on whether the document is stale."
Translation: Another emphasis on fresh content. If your page has content that has not been updated in quite some time, your page is considered old, and lose ranking accordingly.
Claim #21: "The method of claim 20, wherein the determining whether stale documents are considered favorable for the search query is based, at least in part, on how often stale documents were selected over recent documents over time for the search query."
Translation: If your content is old but is repeatedly clicked upon from the search results page, your page will be considered trustworthy, and will not lose ranking on the grounds of being old.
Claims #25: "The method of claim 22, wherein the determining behavior of links associated with the document includes monitoring at least one of time-varying behavior of links associated with the document, how many links associated with the document appear or disappear during a time period, and whether there is a trend toward appearance of new links associated with the document versus disappearance of existing links associated with the document."
Translation: Are you getting more links to your web pages or are you losing links to your web pages? What is the trend? If you are gaining links, you'll be accorded more trust, and will start to rank higher.
Claim #28: "The method of claim 26, wherein the weight assigned to a link is based on at least one of how much a document containing the link is trusted, how authoritative a document containing the link is, and a freshness of a document containing the link."
Translation: Not all links are created equal. Links from pages that Google already trusts -- meaning links from pages that are already ranking well -- will help your site to rank better.
Claim #31: "The method of claim 1, wherein the one or more types of history data includes information relating to differences in documents and anchor text associated with links to the documents; and wherein the generating a score includes: determining whether a content of the document changes such that the content differs from anchor text associated with one or more links to the document, and scoring the document based, at least in part, on whether the content of the document changes such that the content differs from the anchor text associated with one or more links to the document."
Translation: Google is keeping track of the text being used to link to your web page. If the text that is used to link to your web page -- referred to as the "anchor text" -- contains your keywords, that is beneficial for your web page. Don't overdo it though; that can look suspicious, and you can get penalized for that. As always, balance is the key.
Claim #34: "The method of claim 1, wherein the one or more types of history data includes information relating to traffic associated with documents; and wherein the generating a score includes: determining characteristics of traffic associated with the document, and scoring the document based, at least in part, on the characteristics of traffic associated with the document."
Translation: Google uses the traffic data it has associated with the page to determine if the page was relevant to the user's search query. As noted earlier in this series, Google has tools (Google toolbar, Google desktop search, ownership of web analytics firm Urchin to name a few) that enable it to have substantial information regarding the traffic associated with various web pages.
Claim #46: "The method of claim 45, wherein the user maintained or generated data relates to at least one of favorites lists, bookmarks, temp files, and cache files associated with one or a plurality of users."
Translation: Google monitors to see which web pages have been bookmarked. Presumably the underlying belief is that a web page that is more frequently bookmarked is more trustworthy and of more value to the end user.
Claim #59: "The method of claim 58, wherein the adjusting the ranking includes penalizing the ranking if the longevity indicates a short life for the linkage data and boosting the ranking if the longevity indicates a long life for the linkage data."
Translation: If you have many short-term links -- meaning links to your site that only last for a short period of time -- Google can penalize your site and deem it as untrustworthy. The possibility of this occurring opens a whole can of worms, which we'll discuss in the next article.
Section 0074: "Links may be weighted in other ways. For example, links may be weighted based on how much the documents containing the links are trusted (e.g., government documents can be given high trust)."
Translation: Links from official government web pages (for example, pages with .gov domain extension) are deemed more trustworthy. Presumably the same may apply of links from web pages of official educational institutions (i.e. those with .edu domain extension).
Section 0077: "The dates that links appear can also be used to detect "spam," where owners of documents or their colleagues create links to their own document for the purpose of boosting the score assigned by a search engine. A typical, "legitimate" document attracts back links slowly. A large spike in the quantity of back links may signal a topical phenomenon (e.g., the CDC web site may develop many links quickly after an outbreak, such as SARS), or signal attempts to spam a search engine (to obtain a higher ranking and, thus, better placement in search results) by exchanging links, purchasing links, or gaining links from documents without editorial discretion on making links. Examples of documents that give links without editorial discretion include guest books, referrer logs, and "free for all" pages that let anyone add a link to a document."
Translation: Another piece of the document that has potentially enormous implications. Essentially, the implication is that pages that acquire links too quickly can be deemed untrustworthy, as they be attempting to manipulate rankings.
Section 0090: "Additionally, or alternatively, search engine 125 may monitor time-varying characteristics relating to "advertising traffic" for a particular document. For example, search engine 125 may monitor one or a combination of the following factors: (1) the extent to and rate at which advertisements are presented or updated by a given document over time; (2) the quality of the advertisers (e.g., a document whose advertisements refer/link to documents known to search engine 125 over time to have relatively high traffic and trust, such as amazon.com, may be given relatively more weight than those documents whose advertisements refer to low traffic/untrustworthy documents, such as a pornographic site); and (3) the extent to which the advertisements generate user traffic to the documents to which they relate (e.g., their click-through rate). Search engine 125 may use these time-varying characteristics relating to advertising traffic to score the document."
Translation: Google is looking at who is advertising on your site; meaning who you are linking to. Interestingly enough linking to credible advertisers may boost your rankings. Does this mean running AdSense -- Google's own contextual ad program -- will boost your rankings?
Section 0094: "If a document is returned for a certain query and over time, or within a given time window, users spend either more or less time on average on the document given the same or similar query, then this may be used as an indication that the document is fresh or stale, respectively. For example, assume that the query "Riverview swimming schedule" returns a document with the title "Riverview Swimming Schedule." Assume further that users used to spend 30 seconds accessing it, but now every user that selects the document only spends a few seconds accessing it. Search engine 125 may use this information to determine that the document is stale (i.e., contains an outdated swimming schedule) and score the document accordingly."
Translation: Once again, Google is suggesting that it can track user behavior, and will incorporate this information into its ranking algorithm. If users are spending less time on a web page than they previously were, this may suggest the web page is old and no longer relevant -- and hence warrants a decrease in ranking.
Section 0097: "According to an implementation consistent with the principles of the invention, information relating to a domain associated with a document may be used to generate (or alter) a score associated with the document. For example, search engine 125 may monitor information relating to how a document is hosted within a computer network (e.g., the Internet, an intranet or other network or database of documents) and use this information to score the document."
Translation: The kind of hosting you have can be factored in as well. Are you on virtual hosting, where you are sharing an IP address with other sites? Perhaps this hurts your trustworthiness, as more credible and sincere sites are likely to have their own IP.
Section 0099: "Certain signals may be used to distinguish between illegitimate and legitimate domains. For example, domains can be renewed up to a period of 10 years. Valuable (legitimate) domains are often paid for several years in advance, while doorway (illegitimate) domains rarely are used for more than a year. Therefore, the date when a domain expires in the future can be used as a factor in predicting the legitimacy of a domain and, thus, the documents associated therewith."
Translation: Buy your domain names for a longer period of time; this shows that you are committed to your web site, and hence more trustworthy.
Section 0101: "Also, or alternatively, the age, or other information, regarding a name server associated with a domain may be used to predict the legitimacy of the domain. A "good" name server may have a mix of different domains from different registrars and have a history of hosting those domains, while a "bad" name server might host mainly pornography or doorway domains, domains with commercial words (a common indicator of spam), or primarily bulk domains from a single registrar, or might be brand new. The newness of a name server might not automatically be a negative factor in determining the legitimacy of the associated domain, but in combination with other factors, such as ones described herein, it could be."
Translation: Pick a web host with a good reputation. If your web host is also hosting sites that are not trustworthy, you could be penalized by association.
Section 0106: "A query set (e.g., of commercial queries) can be repeated, and documents that gained more than M % in the rankings may be flagged or the percentage growth in ranking may be used as a signal in determining scores for the documents. For example, search engine 125 may determine that a query is likely commercial if the average (median) score of the top results is relatively high and there is a significant amount of change in the top results from month to month. Search engine 125 may also monitor churn as an indication of a commercial query. For commercial queries, the likelihood of spam is higher, so search engine 125 may treat documents associated therewith accordingly."
Translation: Not all search queries are treated equally. If you are trying to optimize your site for a term like "search engine optimization," for example, you may find that Google issues penalties more swiftly than if you are trying to optimize for "charities for homeless." Commercial terms are under greater scrutiny.
Section 0116: "In an alternative implementation, other types of user data that may indicate an increase or decrease in user interest in a particular document over time may be used by search engine 125 to score the document. For example, the "temp" or cache files associated with users could be monitored by search engine 125 to identify whether there is an increase or decrease in a document being added over time. Similarly, cookies associated with a particular document might be monitored by search engine 125 to determine whether there is an upward or downward trend in interest in the document."
Translation: Leave your mark on your visitor's computer, as Google will be accessing that information as well. One simple way this can be done is by using cookies, and adding links that allow your users to easily bookmark your site.
Section 0124: "A sudden growth in the number of apparently independent peers, incoming and/or outgoing, with a large number of links to individual documents may indicate a potentially synthetic web graph, which is an indicator of an attempt to spam. This indication may be strengthened if the growth corresponds to anchor text that is unusually coherent or discordant. This information can be used to demote the impact of such links, when used with a link-based scoring technique, either as a binary decision item (e.g., demote the score by a fixed amount) or a multiplicative factor."
Translation: More evidence that Google is cracking down on spam. If you make manipulation of anchor text evident, you will get penalized.
This is a lot of important information. In the next article, we'll wrap things up and see what others are saying around the web regarding this patent.


<< Home