David Hardtke's Blog

Home Feature Tour About Privacy Blog
 

The French Trail


I'm currently training for the upcoming Oakland Marathon. Last weekend I needed to go for a long run in spite of a steady rain. I decided to do the French Trail -- if there is any trail that will make a run good on a miserable day it is the French Trail. Here's the GPS. I was surprised to several other runners out that day, all covered in mud, and all enjoying the rainy day in the forest.

For those of you not familiar with the French Trail, it's located in Redwood Regional Park. The park has 4 major north-south trails that are excellent for running (East Ridge Trail, Stream Trail, West Ridge Trail, and French Trail). The French Trail is by far the most difficult, but also the best. I think it is the best running trail in the East Bay. The French Trail hugs the eastern side of the mountain that forms the Oakland skyline. Becuase of this geography, it is able to support the few remaining redwood groves in the area (these are second generation trees -- all of the old growth was cut down to rebuild San Francisco in 1906). The trail has several micro-climates with bits of chapparal intersperesed with redwood rain forest.

The trail is quite difficult to reach. You need to hike about a mile the trail head (park at the skyline gate, and take the West Ridge Trail). In order to reach the best part of the trail (between Tres Sendas and Chown Trails), it requires a several mile hike or run from the parking lot. The best way to access this part of the trail is to park at the Redwood Bowl parking lot, and take the West Ridge Trail to either Tres Sendas or Chown.

 
 
 
 

A study of web traffic from blogs and social media


Based on anonymous usage data collected at Stinky Teddy we have written a paper entitled A Measurement of the Social Media Impulse Response Function. This paper is of interest to Internet startups, public relations professionals, and anyone interested in the new web-blog-social media information sharing ecosystem.

Like most alternative search engines, Stinky Teddy doesn't get much traffic. On an average day we get a few hundred searches on our site (Google handles about 1 billion searches per day worldwide). It doesn't help that our advertising, marketing, and public relations budget is $0. This is not strictly true - we once spent $40 on a Facebook advertising campaign, but that experience warrants a separate blog post.

We do, however, get an occassional surge of traffic. Somebody writes an article about us on their blog and we get a bunch of people checking out the site. Our first surge of traffic came in October, when Frederic Lardinois wrote a short piece on ReadWriteWeb entitled Stinky Teddy: A Cool Real-Time Search Engine with a Rather Odd Name. We didn't know the article was coming, and only noticed that it had been posted when our site crashed (we had a memory leak, since fixed). Before this ReadWriteWeb article we got no traffic whatsover as we had not yet released the product.

Being a scientist, I couldn't help but to utilize this ReadWriteWeb post as a chance to do an interesting study. The Internet Entrepreneur's dream scenario is the following:

  1. Build Great Product in Secrecy
  2. Using PR, generate massive news coverage on day of launch
  3. Go viral, with peer-to-peer messaging on social media leading to massive adoption.
This scenario never works for new search engines. Nonetheless, there are a bunch of people out there willing to try a new search engine, and positive news coverage is the way to get on their radar screen.

When it comes to planning for this glorious launch, however, there is one question that the Internet Entrepreneur wants answered that nobody will tell them. How much traffic will I get, and how long will it last?. The study I performed using the ReadWriteWeb post addresses the "how long" question.

The basic premise behind the study was that this single ReadWriteWeb post was singularly responsible for all traffic on Stinky Teddy for the next month. Our traffic before was nil, and we did absolutely no marketing or PR during this period. Therefore, any visitor or user on our site during that period was directly or indirectly related to the ReadWriteWeb post. For the first time, we were able to measure the "Impulse Response Function" of the web-blog-social media ecosystem. The "Impulse Response" is how a system responds to a sharp input signal (for a detailed discussion, read the paper). In this study, we measured the hourly/daily traffic on our site. That is the data we need to determine the impulse response. Here's the traffic in the 100 hours after publication:

This shows an interesting two-peak structure to the traffic. The first peak is obviously direct traffic from the ReadWriteWeb blog. We suspect the second peak is due to social media (e.g. Twitter sharing) and news readers (Google Reader, Netvibes, etc.). The second peak corresponds to 9 AM on the East Coast of the United States, so these are people checking yesterday's news when they arrive at work the next morning. We also looked at traffic on Stinky Teddy for the next 25 days:

Here we see something very interesting. In the web-blog-social media ecosystem stories "ring" for a long time. Half the traffic attributable to the ReadWriteWeb article came more that 4 days after the article. Only 10% came during that initial 5 hour burst from the ReadWriteWeb page.

This is a one-time only experiment. We've had several other momentary spikes in traffic, but only for this period in October through Novembmer could we definitively attribute all of the traffic back to a single source. It would be interesting of others repeated this study to see if what we observe is universal. Our main findings are:

  1. Only 10% of traffic eventually generated by the blog post came via early direct clickthroughs from the ReadWriteWeb home page.
  2. There is a two-peak structure in the traffic during the first 24 hours, with the second peak likely associated with "first thing in the morning" readers of yesterday's news through social media sharing or readers.
  3. Half of the traffic (from both direct and indirect sources) came four or more days after the article was posted.
Please read the paper and leave your thoughts below.

 
 
 
 

Full Proposal to Knight Foundation/Thoughts on Fair Use


To my great delight and surprise, my grant proposal to the Knight Foundation was selected for the next level of review. The proposal aims to create a performance royalty system for online journalism. Today I submitted the Full Proposal. Please have a look and post comments on their web site.

After the proposal passed the preliminary round of review, I started to seriously investigate the legal issue involved. I contacted several lawyers, both to get some insight and also to line up future collaborators. My proposal, at it's essence, involves charging search engines to index and cache web content. The performance royalty idea is just a fair way of determining the proper distribution of payments to the various news providers and journalists.

The legal issue involved is with regards to "fair use". From my non lawyer understanding, fair use means that I am allowed to reproduce a small passage of a copyrighted work under certain conditions. To determine if an action is legal under fair use, one must use a balancing test. Factors include the purpose of the use, the nature of the work, whether the use impacts the value of the copyrighted work, and the amount of the excerpt compared to the whole.

Somewhere in my academic training, I learned that fair use with regards to text can be mostly captured by the "three sentence rule". You are allowed to quote three sentences and be safe. Search Engines generally follow the three sentence rule (snippets shown on search results pages are never more than three sentences long). Based on this simple rule, search engines have argued that they should never need to pay to link. I agree for the most part. The Internet is all about linking, and charging to link and quote others would be disastrous for the Internet.

It's a bit more complicated, however, in the case of search engines. In order to generate a snippet, a search engine must cache the entire content of the document. The document might not be cached in it's original form, but the entire document is cached in a derivative form. The snippet is generated in response to a user query -- that's why the cache is necessary.

The interesting question is whether I, as a web site publisher, automatically authorize the automated caching of my copyrighted content once I stick it on the web without password protection. Does allowing people to read my web page also give a search engine crawler the right to read my web page and store it's findings? As far as I can tell, the answer to this question is unclear given the current state of the law. During my recent research, I was pointed to an excellent editorial by Bruce Brown and Bruce Sanford on this very subject of Fair Use and Search Engines.

 
 
 
 

Official Launch of Stinky Teddy: New Search Engine Based on Real-time Gossip


FOR IMMEDIATE RELEASE

Media Contact:  David Hardtke

david@stinkyteddy.com or (510) 823-8982 

             

New Search Engine Based on Real-time Gossip

Stinky Teddy listens to world's conversations to build better universal search engine.

 

Oakland, California – December 14, 2009

Stinky Teddy today launched a search engine based on the premise that the people, places, things, concepts and events that people are actively talking about are also often what people want in their search results.  The address is http://www.stinkyteddy.com.

Traditional web search engines, such as Google, Yahoo!, and Bing, rank results based on historical data and historical user behavior.  Recently, both Bing and Google began including updates from the real-time web (Twitter, Facebook, Myspace, Blogs, and other social media sites) on their search result pages.

Stinky Teddy founder David Hardtke, who until recently worked as a research scientist in particle astrophysics at UC Berkeley, felt that content from the real-time web would be more beneficial to a search engine if utilized differently. Explains Hardtke,

    "A single 140 character tweet from someone you don't know is generally not useful to you.  It is often the case, however, that multiple people are tweeting and posting on the same topics that you might research with Google.  This is no coincidence.  The real-time web is the new office water cooler, the place to share the juicy gossip.  The idea behind Stinky Teddy is to use all these recent tweets, status updates, and posts to figure out why you might be coming to a search engine, and give you a better answer or more timely content.  It's wisdom of the crowds meets web search."

Behind the scenes, Stinky Teddy is a metasearch engine that polls multiple primary search engines for each user query.  Current partners include Bing, Yahoo! Boss, and Videosurf.   Additionally, each user query is submitted to multiple real-time search engines, including Twitter, Oneriot, and Collecta.   The content from the real-time search feeds is analyzed using a proprietary algorithm that figures out the up-to-the-minute meaning of the search query.  Based on this knowledge, a universal search page (web, news, images, video, and real-time status updates) is constructed.  Hardtke elaborates,

"Our search results page differs fundamentally from others.  Compared to a regular search engine (Google or Bing), our web, news, image, and video content is more focused on what people are actively talking about and less on the historical meaning of that search query.  For instance, is anyone typing 'Tiger' into Google today interested in tigerdirect.com or facts about the animal?  We suspect not. Compared to a real-time search engine, we don't focus on the links that people are sharing, but instead focus on the language people use to describe contemporary concepts, people and events.  Depending on the level of buzz, we will behave differently, more like a regular search engine (low buzz) or tabloid magazine (high buzz)."

The unusual name comes from a stuffed animal that has been a lifelong companion of Hardtke's daughter.  The web domain was first created as an homage to this particular teddy bear, and later appropriated by the search engine.  Hardtke adds, "I assumed that the name was only temporary, but everyone with whom I spoke responded with either, 'I love the name' or 'I hate the name but won't forget it'."

Stinky Teddy is privately funded and based in Oakland, California.    

For more information contact David Hardtke at david@stinkyteddy.com or call +1 (510) 823-8982.

 

 

 

 

 
 
 
 

New Economic Model for Journalism and Search Engines


While developing Stinky Teddy, I thought quite a bit about the business of search, from the perspective of the consumer, the search engine, but also from the perspective of the content providers. Most of my thinking of course was about how to build a search engine that was attractive to consumers -- how are the needs and wishes of the consumers not being fully served by Google? The only way to get people to use your product is to give them a compelling reason to do so. We're trying to build a search engine that is better than Google some of the time, as good as Google most of the time, and not ever substantially worse than Google. If we can achieve that, we hope that people we'll trust us with some of their precious attention.

Search, however, is a funny business. The profit margins in search are very high. As with all things software related, the marginal costs of an extra customer are very, very small. Most of the work goes into the development of the software and the buildup of the hardware infrastructure. As a metasearch engine, we don't have to do our own crawl of the complete internet so our fixed capital costs are very low. The portion of our costs that scales with our level of traffic (the extra computing resources, bandwidth, server administration, etc.) are very low (somewhere around one hundreth of a cent per search).

We aren't trying to monetize our traffic to begin with -- we're more concerned with gathering important usage data that will allow us to build a better search engine. Eventually, however, we'll start making money by putting sponsored links next to and (more lucratively) above the search results. The amount of revenue that this will generate depends on many factors (user demographics, types of searches performed, quality of advertisers), but the big boys in general purpose search (Google, Yahoo, Bing, AOL, Ask) generate an average of 10 cents per search for US traffic. A business with unit cost of one hundreth of a cent and unit revenue of 10 cents sounds pretty good, eh?

Typically, businesses with absurdly high profit margins attract competitors who gain market share by undercutting the price. This is where search is a funny business. From a consumer perspective, search is free. You can't undercut a price of free.

I'd argue, however, that search is not really free from a consumer perspective. Huh? There's no Paypal widget on Google. Chris Anderson recently wrote a book called "Free: The Future of a Radical Price" that explained why it is impossible to directly charge for anything on the Internet (disclosure -- I haven't read the book, only reviews). People want stuff to be free, and who can blame them. But free and Free are different. In my mind, something that is "free" is costless to the consumer, but something that is "Free" is paid for by the consumer in non-monetary ways. For the consumer Internet, "Free" most often means that the publisher sells our valuable time and attention to a third party.

We've all made an unconscious and unwritten deal with the search engines. Give us what we want for free most of the time, and we'll let you sell our most valuable moments to the highest bidder. When our attentions turn to valuable activities (i.e. we're about to spend some money), the sponsored links show up above the search results.

People pay differently. Most people claim they never click on sponsored links. For these people, there is a mental cost to identifying and skipping the sponsored links, and this filtering process necessitates absorbing some of the commercial message. Other people don't know the difference between sponsored links and normal results and simply click on the first link they see -- these people are the ones who subsidize the search engine for the rest of us (we owe them our gratitude). These folks pay directly through higher prices for the goods they buy (the money paid for that click has to be built into the price of the good sold).

Our "monetizable moments" are fairly infrequent, so this is a pretty good bargain for the consumer. Most of the time, we get what we want for free. The search engine gets to control our attention when we are truly hot commodities -- they serve as the gatekeepers to our wallet. This system is good for them.

One group, however, gets hosed in this economic system. The people that suffer are the journalists and other original content providers. It's very simple -- they provide much of the content that we consume when we use the search engine for free, but get none of the benefit of our "monetizable moments." Sure, they get to show some banner ads next to the articles we read, but by the time the consumer has reached their page he or she is essentially worthless. The journalists provide a disproportionate share of the material that appears on the search engine, but gets almost none of the revenue. It's kind of like a contractor who builds and sells a house but doesn't pay for the concrete in the foundation.

While thinking about this, I came up with the perfect analogy. Search Engines are like radio stations. With radio, consumers listen to songs for free in exchange for being forced to listen to an occasional advertisement (the "monetizable moment"). Songwriters are the foundation of the whole system. The big, big difference between search engines/journalists and radio stations/songwriters is that radio stations pay the songwriters performance royalties. A performance royalty is a fixed fee paid to the composer of song whenever the song is played. Why not do the same thing on the Internet? Why not pay the content creator whenever an aggregator (e.g. search engine) links to and quotes from an article?

Although this will be bad for Stinky Teddy (added costs), I put together and submitted a proposal to the Knight Foundation News Challenge (Performance Royalty System for Online Journalism). The proposal outlines this idea and proposes to implement the technology and build the needed non-profit organization.

What I'm advocating is a radical change in the economics of the Internet, but quality journalism is the foundation upon which much of the Internet rests and should be kept viable.

 
 
 
 
 

« September 2010
SunMonTueWedThuFriSat
   
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
  
       
Today
Follow Stinky Teddy

    [This is a Roller site]
     
    © Stinky Teddy