WGHubris on January 17th, 2011

Although the case is being misinterpreted all over the Internet, it is true that police officers and other government law enforcement agencies can search your cell phone without first obtaining a warrant. The part that everyone is leaving out, is that they may search your mobile phone without a warrant after you have been arrested.

In other words, just like your coat pockets, and other personal affects, your cell phone may be looked at by police officers once you have been legally detained for probable cause (arrested). It’s not like a cop can just grab your iPhone and go through it for no reason. Also, police have no more access to calls you make and recieve on your cell phone than they do with your regular phone.

If you are waiting for this court decision to be overturned, don’t hold your breath. This case went 5-2 in California, one of the most liberal courts in the land, so chances are not good for this one to be thrown out higher up the legal food chain.

WGHubris on January 15th, 2011

Counting links is the core of Google’s ranking algorithm. The idea that the more incoming links a webpage has the greater its “authority” or the more highly recommended it is has powered Google search for a long time. Unfortunately, one of the main reasons Google is broken is that counting links is no longer a valid way to determine the quality or even the popularity of a webpage.

Another major reason Google is broken is the title tag.

The title tag is one of many codes used in the programming language of the Internet called HTML. In HTML, the webpage creator defines a title tag by putting something in between <title> and </title>.

You might think that whatever is input between the title tags is the title that you see on a webpage, but you would be wrong. The visible title at the top of most webpages is actually created by something called a header tag. Header tags are also powerful search engine optimization factors, but secondary to the title tag.

The title tag does not impact anything that you see ON a webpage. Rather, the title tag determines what is displayed in the title bar of older web browsers before their were tabs for each web page that was open. Today, the title tag determines what is displayed on the tab of a particular webpage. Any modern browser with more than one webpage open uses tabs and the title is almost never completely displayed, yet the title tag is STILL one of the most powerful Google ranking signals.

Overemphasis on Title Tag

Once upon a time, the search engines on the Internet were very unintelligent. The earliest ones did nothing more than count the number of times word or phrase appeared on a webpage. Thus, if a webpage said, Ford Trucks, ten times, then it was “better” result for someone searching for “ford trucks” than a page that only said it six times.

Of course, when people caught on, they just started repeating the word as much as possible. They stopped even bothering trying to include it in text somehow and just started putting words at the bottom of the webpage and repeating them over and over. Some webpages started doing it with the same text color as the background color so that it didn’t look bad to people, but all of those keywords counted.

The main reason for Google’s success was that it did not depend on those keywords for its rankings, so all of those spammy webpages didn’t clutter up its search engine results pages, or SERPs.

Instead of using the text of a webpage, where there was too much keyword spam to use reliably, Google uses the title tag to determine what a webpage is about. While counting incoming links can tell you which pages are best regarded, you still have to pair the content with the searcher’s intent. The best root canal webpage in the world is a terrible result for someone searching for beach volleyball, no matter how many links it has pointing to it.

Thus, Google search rankings are a result of a combining how well the text in the title tag matches the text typed into the Google search box and then how many links are pointing to the best matching webpages.

This is how the content farms consistently show up high in Google search rankings. A webpage with a title tag of “how to use hp laserjet 1012 printer on windows 7″ is a better match for the same search than a webpage titled HP LaserJet 1012 Windows 7 Printer Drivers, even if the latter has more incoming links and better information. By constantly adding content with every possible variation on popular or frequent searches, the content mills can rank their webpages high for specific Google searches.

Repetitively publishing such similar content just to capture subtle variations in search text typed by users is not feasible for legitimate publishers. The New York Times, for example, would not be well served by publishing an article about pentagon spending in numerous variations just so that it will rank highly for not just the original title, but for other ways that users might search for it. Thus, the Times ends up with one well-deserved high ranking, while someone else claims the high rankings on all the variations.

Ironically, the more legitimate a resource is, the less likely it is to customize its title tags on every webpage. For example, many government webpages are the actual, legally defined authority on certain subjects, but because they do not “optimize” their title tags, other webpages (legitimate or not) show up above them despite the boost that Google gives to webpages housed on .gov domains.

For example, searching for “colorado school rankings” brings back not the official Colorado Department of Education website, nor the test results for the official test, but rather numerous sites that republish the data of the former.

Why don’t the real websites rank higher?

Because their title tags are things like “Colorado Department of Education” and “Unit Assessment Test Rankings”.

No matter what data is on your webpage, if there is another page with a closer title tag to the search performed, it won’t show up, even if the exact phrase does appear in the body of your article.

Up Next: Why Google Is Broken – Anchor Text

Tags: , , , ,

WGHubris on January 14th, 2011

There has been a lot of talk recently about the amount of spam that appears in Google’s search engine rankings. Indeed, the once vaunted, gold-standard, of Internet search has come under an increasingly critical eye as technology writers re-examine what was a non-issue. Many of these commenters and Google critics are blaming the so-called content mills and SEO consultant and webmasters who are “gaming” the Google ranking algorithm. Unofficial Google spokesman Matt Cutts recently commented that members of the web spam team had been moved on to other projects inside of Google.

However, none of these issues addresses the real problem with Google’s search engine. The real problem with Google search rankings is that many of the core principals upon which Google’s search algorithm was built are no longer true, or were never true in the first place.

Google Ranking Algorithm

The full Google ranking system has never been disclosed by the company. However, early researchers of Google search engine functionality used papers published by Google founders, as well as patents and other public documents, to piece together the core ranking system used by Google. Subsequent comments and pronouncements by the company, as well as real-world business research and methodology have painted a fairly strong picture of what goes on at the heart of Google’s ranking algorithm.

According to official and semi-official pronouncements from Google, there are hundreds of “signals” that go into ranking the webpages returned by Google’s search engine in response to a user search query. However, the vast majority of these so-called influences are insignificant compared to the “core” ranking signals.

The crux of Google’s search rankings is the paradigm that each link pointing to a website is an endorsement of the quality of that’s webpage’s content. The idea behind this ranking signal is that the more people who recommend a website by pointing their own readers to it, the better that website must be. However, this paradigm falls apart completely under the most minor scrutiny.

To be valid in for search ranking, content publishers must link elsewhere only because of perceived value.

This is so naïve as to be laughable. Webmasters and web page creators link to all manner of content around the web for all manner of reasons. Sometimes links are to things that are funny. Others are links to something negative.

A recent New York Times article pointed out that one website owner ranks highly in Google search results almost solely on the basis of those so enraged by the company’s business practices that they have filled forums, review sites, and more with negative reviews and scathing commentary. Despite the context, each and every link counted as a vote up in Google’s search algorithm.

After this particular high-profile embarrassment, Google tweaked its algorithm, supposedly so such things wouldn’t happen in the future, but more likely with no more affect than lowering the rankings of this particular website.

The “no follow” tag was supposed to prevent problems like this from occurring. By adding a no follow tag to a standard HTML link, the content creator can curate their links to prevent their approval from passing through a link. However, in order for the no follow tag to work, the web content writer must understand both webpage programming (HTML) AND both understand, and care about, how Google ranking power flows through links. This is obviously not the kind of thing that your average review writer has a grasp of.

Which brings us to the first reason Google’s search algorithm is failing.

Web publishing is no longer the providence of “experts.”

When Google was first created, there was a small, if significant, barrier to entry to publishing web content. While tools such as Geocities and even AOL allowed users to create webpages, one had to first know such a thing existed, and then decide that it was worth whatever effort was necessary to publish online.

Doing so required nothing less than sitting down at a computer with an Internet connection (not all of them had one then), typing the webpage, and then publishing it somewhere. Under these circumstances, it was a logical assumption that anything that someone bothered to write about and then link to, was something that the person felt was somehow noteworthy. Any “bad links” would be buried under the better links that more people wrote about.

These days, however, publishing online is something that is not only easy, but known about, and used by virtually everyone. Numerous services from Twitter, to Facebook, to Tumblr, and more allow almost anyone to post almost anything with virtually no learning curve. The computer isn’t even required anymore. Most smartphones provide a half-dozen ways to publish content online right out of the box.

Today, a link might be an endorsement or recommendation, but it could just as easily be a gag, a way to help a friend find a homework assignment, or a legal disclaimer automatically inserted without the user’s knowledge into every post or email sent.

Unfortunately, it seems as though Google’s search engineers don’t get out of the Googleplex enough to realize that the world around them has changed fundamentally. The search ranking continues to really on the fantasy that the search numbers of the Internet will swamp under all of those non-worthy links.

It doesn’t take much analysis to come to the conclusion that non-endorsement links are just as common as those fully intended to pass on their so-called link juice, and that is BEFORE one takes into account the number of people deliberately creating links meant for no purpose OTHER than to add a little more ranking power to each website.

Why Google Is Broken – The Title Tag

Tags: , , , , ,

WGHubris on January 8th, 2011

I finally stumbled across a mainstream journalist who refuses to buy into the hype around Facebook and call it like everyone who isn’t a techie blogger sees it. Facebook doesn’t make all that much money now (profit-wise) and its only hope for the future is to cash in while everyone seems ignorant of the fact that users are not the same thing as dollars.

Facebook’s valuation climbs with every investment, if and only if, you subscribe to the theory that the value of a company is what the last buyer paid for it. That doesn’t mean that it has figured out how to turn users into profits. And, it most certainly doensn’t mean that it has insulated itself from being the next AOL.

The only question is, will Facebook cash out before everyone sees the smoke and mirrors or will is sink slowly taking all of that temporary wealth with them? It looks as though Goldman Sachs’ clients will get a front row seat to that contest.