Cutting one Cord…Re-connecting to another

What if an industry many had destined for a  “big reset”, took the lead in its own disruption

This may best describe the current state of the TV industry today. 

“The business has never been better” – Bruce Rosenblum, president of Warner Bros. Television Group, speaking on the challenges and opportunities facing the TV industry, 2012 NAB Show.

There has been no lack of skeptics or coverage to the pending demise or collapse of the TV industry in 20012, as it did in 2011, and 2010. While the mood has been sombre across the industry for the past couple of years, recent events in 2012 support the statement made above.

There has been this sense of revival at the NAB convention and The Cable Show this year focusing on technical innovation, a unified consumer experience, developing partnerships, and embracing the changing demographics the industry has been dealing with.

What has changed?

Skeptics will be pleased to know that the industry is still faced  with the same continuous  state of technical and cultural change today, as it has over the past few years. From implementing the switch from standard to high-definition, the emergence of on -demand viewing (pay-tv, DVR), and the rapid growth and maturity of the Internet.

These technical enhancements have left the TV industry with a modern infrastructure and the best technical solution today to better the consumer experience

Where’s the problems?

“Its complicated, the industry is very fragmented, until this works itself out,  the waters are going to get really choppy” is was how the current state of the TV industry was described to me. Why complicated and fragmented? There is little collaboration going on between the cable providers, broadcasters, or technology companies, each serving up there own version of the user experience. 

Part of this battle, is the 100M households that the cable providers combined currently reach. For the past two years the industry has been seeing declines in subscriptions, due to cord-cutting, or simply households canceling their subscriptions.

Perhaps at the root of all of this is the need for the broadcasters, and in some degree cable providers to take a radical departure from the of concept of basic and premium channels, and possibly the channel itself, if the industry is going to continue to tell the consumer why its the best connection for the future.

The Numbers.

Statistics published by SNL Kagan

In 2001, traditional analog subscriptions peaked at 66.9M households, as the cable providers began to roll out digital video, which had reached 14.6M households. Broadband access still in an infancy phase, had reached 7.3M households.

By 2011 this shift in consumer behavior was more visible and difficult to ignore, as the subscriber rate in cable TV began to slow and broadband accelerated in growth. From 2006 to 2011, combined cable offerings did increase by  ~6 million  from 96 million to ~102 million households, but broadband reached ~50M households an increase of ~16M. 

Added to this behavioral shift was a growing demographic of cord-cutters, who have been reported to contributed to the declining number by ~4.5M households since 2010.

This shift in viewing behavior has left ad revenues flat since 2009, though having rebounded in 2012, advertisers are more cautious on what networks they spend their dollars with. 

The TV manufacturers and retailers have not been left unscathed, as cord-cutting has resulted in lower sales, and declining margins due to incentives and discounting

What is the TV industry doing to stay a step-ahead of all this change?

A recent survey published by Deloitte, on the “State of the Media Democracy” reported that 1 in 5 households  are either cord-cutting or planning to do so over the next year. That would equate to a staggering ~11 million of the 100 million households, canceling their cable.

The cord-cutters, saving themselves ~1000 dollars a year in cable bills, typically convert to a broadband only subscriber. While the current economic conditions are considered to be a factor, there is a long-term macro trend taking shape. A younger demographic that is coming out of high school or college who have no intention of spending their dollars on  a traditional cable TV package,  instead getting their programming from online sources at a reasonable cost. Providers seeing this threat to the industry, are trying to win over this demographic, are pushing broadband first, with the options of a stripped down cable package

Many cable providers have begun to put more emphasis on this subscriber shift, even positioning themselves as “broadband” companies. Quarterly earning calls have become a platform to promote how many broadband subscribers were gained and how this really offsets the loss of cable subscriptions.

Why are providers embracing this shift?

Higher margins, for now,  broadband subscriber earn 20% more in gross profit margin, just by eliminating the licensing fees paid the broadcasters.  In 2012 broadband only has lead to higher reported revenue/subscriber.

Providers at some point in the future will want to shed themselves of the unwanted caveats that come with cable subscriptions, paying the broadcasters a licensing fee. The intensified fees disputes  over this past year serves as example to the precarious relationship between the providers and broadcasters.

Played out in the public, the disputes have become a common re-occurrence, leading to broadcasters services being pulled  from the provider. What has typically been passed along to the subscriber, is no longer a welcome option, fearing lost subscriptions. The subscriber will not be willfully used as the negotiation tactic for long either.

What is happening today?

The message has become less about the technology, rather what the technology will deliver to the user, more importantly how it will improve their experience.

As with the music and newspaper industries, the technology challengers are lining up at the TV industry’s gate. So far though there has been plenty of talk, positioning, and rumors, but there has been little delivered that has changed the user experience.

These challengers biggest limitation, content acquisition costs. Any online streaming provider wanting network programming quickly find themselves in the same position as the traditional providers, paying the broadcasters

Hulu and Netflix got the unified viewing experience across multi-device right early on.  The success of these services have been one of the biggest motivators for consumers to cord cut. Their future will be decided on their ability to expand content, while maintaining subscriber costs.

How the providers decide to work with some challengers will be seen in the development of an improved set-top box. Many providers see the current video platform as archaic and welcome others to develop a broader distribution platform  that will reach traditional TV, web, and mobile devices.

Lumped in with these challengers, are Apple, Google, and Microsoft, who will have the biggest influence where the TV industry is headed.

Microsoft, beyond pushing the Xbox as the platform to push content into the home, hasn’t gotten further then this. Plans to hire a experienced network exec to lead this initiative in 2011, fell apart in 2012, when announced, plans were on hold, reporting high licensing fees.

With Apple, comes plenty of rumors, and anticipation. AppleTV, has been a huge sales success, but is content starved, Apple considers it to be a hobby. Steve Jobs was quoted saying that the TV industry was broken, and he’s figured out how to crack it. Talk of a smart TV isn’t close to reality, more likely is a next generation AppleTV with conventional cable capabilities. You have to think that with 200M itunes account holders, across devices, and experience in negotiating content agreements, both providers and broadcasters will work with them.

Google, had a rough start with GoogleTV and now Nexus Q, but they have the boldest vision of the three.

Teaming up with Hollywood producers and celebrities, and a 100M committed to developing original content for Youtube is taking shape. As with traditional programming, all it takes is a couple blockbusters for the platform to take off.

Google Fiber, puts them that much closer to our living rooms and mobile devices. The first roll out of this high speed Internet + digital TV service underway in Kansas City, making Google a competitor of the cable providers. So far the service is in a wait and see mode, with plans to expand to new cities, it will help other providers how to market broadband, and spur some innovation.

The viewer in the end will need to be convinced why they need it, and the convincing will come in the form of multi-screen distribution.

What will change the viewer experience?

Viewers are still watching live events, may it be  awards shows, breaking news, or sports. If there is one live programming category that will influence the shift in viewing habits is going, its sports. Sports has built  the conventional tv networks, cable, satellite, and is now heavily influencing online and mobile adoption. Sports is probably the number one reason majority of subscribers still pay for cable.

How important is sports really to TV?

In 1993, after 38 years, the NFL left CBS for Fox, then a relative newcomer to broadcast TV, and with no established sports division. Fox outbid CBS by well over 1B dollars to acquire the rights to the NFC broadcasts, providing them with a known platform to promote their other shows. As a result of this loss, CBS would suffer both a decline in overall ratings and a loss of affiliates. For Fox, this win would serve as the tipping point in becoming a real 24/7 network, as well alter the NFL viewer experience for the better.

Watch any NFL game, and count the promos for the broadcaster’s other shows, live sports is a critical platform in building interest and returning viewership. Look at the highest-rated programs over the past five seasons, on average 7 out of 10 have been NFL games, and the majority of the top fifty have been sporting events. Sports is now the “flagship” brand to the broadcaster.

Consider that ESPN gets a 5 dollar/subscriber fee every month from each cable provider,  and can dictate how it plans on rolling out its content for online viewing. Viewers want to be connected to their sports, and the broadcaster can use this as spring board to how it should build out a online digital strategy.

Sports took control of multi-platform distribution early on, with MLB tv leading the way, developing a rather progressive service that kept viewers connected with its games. Key to all of this, is the creation of a global reach, by ensuring the experience reaches the most avid fan anywhere.

What does this all mean? Well one, the cable providers and broadcasters aren’t going away anytime soon, will their business model evolve, yes, but both will play an important role in the shift of the viewers behavior. TV everywhere could be seen as the first step taken by both in how it distributes content online in the future, though today its in a highly controlled state.

Technology is important, to a certain point, but it needs the content, to create the experience the viewer wants. As the experience progresses, so will the viewer. At the center of this storm though is the content, how its distributed and paid for. The concept of a channel as a distribution method will slowly fade away as advanced set top boxes and multi-platform distribution begins to take form. Sports programming distribution will determine how viewers want to pay for all content, and how advertising will be sold. The future of TV will be dictated by how the viewer wants to be connected to their sports.

The Economies of Change

As described in earlier blog posts, much of my economic thinking and analysis is influenced by Schumpeter‘s economic theories. The process Schumpeter describes as “creative destruction“, being the driving force of change and sustainability of free market economies, can be seen throughout history.

Many of the institutional frameworks have entirely unraveled, casualties of the Great Recession, the effects far-reaching, some yet to come to full realization. Emerging from this downturn, new opportunities for entrepreneurs, the vital source of creative destruction, and their progressive ideas critical to economic change by innovating industries and frameworks that have become inefficient, continuing the cycle of growth.

Technology has driven our culture, how we work, communicate, and function on a daily basis, at the highest rates of change over the past 50 years. Every period of expansion have left industries behind, replaced by technological advancement, with out a doubt this expansion, as slow and bumpy as it has been, will be no different. This last recession has left many unanswered questions, and a belief that this current evolution underway needs to be a radical departure rather than picking up from where we left off. Decisions will be made on what industries and frameworks are worth carrying forward, and how adopt new technology to improve our decision-making, what products we purchase, who we want to do business with, social equality, employment market, education, and information transparency.

More to come.

The Hiatus

Per ipsum sit scientia

When I originally started this blog in 2010 my intent was for this blog was to provide a different perspective into data, analytics, and the technologies evolving around it. Its been over a year since I last posted, not that I didn’t have topics, I drafted over 4 different posts, but I found the topic to be either cluttered by discussion or worse I wasn’t really providing any further insight.

Why the renewal now? First my exposure to a variety of organizations and deciphering the data problems each faced, has yielded a breadth of experience, engaging with or applying a breadth of technologies, both old and new, has given me subject matter worth writing about, and on a personal note I was given a finite period to enjoy the company of a long-time companion who had provided her unconditional support to me, I owed her beach and fetch time.

Observations and lessons learned;

An unwanted side-effect to complexity is failure. “Plan for it” is discussed, even documented, more often then not, its poorly implemented. Designing distributed computing systems is not easy, if anything you’ve introduced new problems to manage, the process of availability and consistency shouldn’t be in the list of new problems. I was taught a long time ago if its not stable, it won’t scale, and then forget adaptable.

Code is now a commodity. Displaced by the consumerization of information, the emphasis shifiting to data re-use. Transportable, and assembled in most efficient technology, code is now this inter-changeable modal, no longer constraining the distribution or consumption of data.

The first month of the NBA/NHL?MLB season is irrelevant to most fans, as is TPC-H to anyone who actually analyzes data.

DataRefs is a more efficient join. Learned plenty with MongoDB

The cost/benefit that RStats + Python + Jruby + Amazon EMR has provided to a customer base is well immense.

Enterprise Software as it functions and behaves today, is a leading indicator that much of it requires a complete overall. Daily these on-premise applications continue to look tired, constraining organizations from growth.

SAP’s completeness of offering the most robust In-line analytical capabilities across line of business application offering, by far leads any other enterprise vendor. Driving efficiency into every part of the Supply Chain Management to Human capital management, complex algoritms to drive forecasting through to optimization, the embedded functional “intelligence”, delivers to business the right process to execute “competitive analytics”. What SAP hasn’t done well, the ability to execute or integrate these capabilities.

You keep using that word. I do not think it means what you think it means.“.

Advanced Analytics. These two words together, look weird, and make no sense. This was one topic I had started to write a long winded blog on, before stuff happened. For the most, its disappeared, for the better, really it just confused customers. Technologies used to execute many of these techniques/methods of statistical analysis, predictive analysis, data mining, and machine learning, has advanced in many ways. To categorize these techniques as advanced, made it sound like organizations were getting more now then in the past.

BigData. The jargon and concrete definitions of “what is” and “what isn’t” ensues. Rather focus on practical use cases for the technologies in the BigData space, and solving issues that current exist with the tool set, we want to tell people their data doesn’t fit the problem. If one is prevented from turning the data into into actionable intelligence in a required period of time (latency) due to the volume, velocity or the structure of the data (multi/poly), then there is a big data problem to deal with. The complexity factor isn’t so much the data, but rather the analytics, calculations, processing that needs to be performed.

The significance to me about BigData technologies is the problems I see that can be solved, problems I’ve been faced with, and growing problems, building innovative markets, and yes its more then a “Social Kitten” tool. A blog post to come on this topic.

Complex, statistically improbable things are by their nature more difficult to explain than simple, statistically probable things. -Richard Dawkins

The real problem in analytics. Its become painfully obvious the disconnect or confusion in the processing and understanding of the data, is the misalignment between analysis and synthesis. The focus on breaking down data into granular parts, identifying the patterns, to quantify and connect these findings into a drawn conclusion (i.e. sales results were down). Its the next step in learning, how do all these parts work together? When combined/brought together what new concept/measure do we realize from it. The technology is there today (machine learning algorithms, map/reduce) taking many data parts/sources, and bring them together, coming up with a new solution/finding completing the decision making cycle.

I got most of what I wanted to mention, I need to reformat the blog, add a roll of more enlightening reads then mine, and share more details of what I’ve learned.

Measuring your Measures a BI Afterthought

“An unsophisticated forecaster uses statistics as a drunken man uses lamp-posts – for support rather than for illumination.” – Andrew Lang though unattributed

A recent blog post highlighting the importance of a Business Intelligence Competency Center (BICC)shed some light on establishing value and trust in the data we use to measure results.

The difference is in the “how” an organizations comes to this realization that a business strategy must be defined to direct the BI initiative to attain any value ongoing. Partial failure identified early on; can be a invaluable source to the needed measurements and process monitoring; prolonged and the process will reach a state of total failure, leaving little remnants to gather value from.

Absence of Strategy?

An all to familiar reoccurring task for many; a multitude of reports are whittled down to the required columns, data manipulated and reformatted in Excel, distributed to stakeholders lacking appropriate documentation to support the question being answered, suspect data quality, and the data produced now in doubt lacking in value and context misinterpreted.


Resembling the Sunday Times; the same reports are bundled up and circulated among the stakeholders who individually analyze the data, duplicating the efforts in manipulating, summarizing, leaving the data and its message in a state similar to what Dante described in his journey through the 8th Circle.

Is one more detrimental over the other to the business? While the first scenario the data summarized to ease consumption at the cost of data lacking context, leaves its audience to sort out the details resulting in confusion of repeated manipulation and messaging. The conclusion though; identical; neither provides for the beginnings of; nor builds upon a sustainable business intelligence strategy.

So why and how does the process or lack of one get this point? Simply stated; either the question was not sufficiently answered or lack of understanding to the question being asked. The business may be efficiently utilizing the data; and IT is effectively maintaining all the pieces of the BI environment; what has been lost is that point of reference that could curtail the bad habits that can form when people attempt to consume data. It now becomes a lesson in improving the organization understanding, communication, and measuring of what can be and is being done with the data today.

Fostering a BI Strategy

A fresh approach or fixing whats in place? While new does have its advantages; either stage must still identify and shape the strategy from the business goals and needs captured. Its crucial that these goals become the blueprint for data democratization; and the outcome leads to empowering the organization with the ability to learn from the data, process transparency, measuring the effectiveness of the answers, and establishing trust in the data produced.

STATIC it is not. Goals, questions, and measurements will evolve over time; avoiding the scenarios above; the strategy must be dynamic to optimize and adapt as well. This is not a list of tasks, capabilities, or a technical description of the BI environment; and while there is starting point there is no predetermined end point.

TECHNICAL it is not. To embed technology or solutions into the strategy would only inhibit its growth; promote data silos; and lead an organization down the road to failure.

ORGANIC it is. Its critical that the process, data, and effectiveness of the measures are made transparent; this will enable to build on the foundation from the ground up. Questions will be evaluated and determined to be inefficient, redundant, or longer applicable. From this new goals, measurements, and data will emerge.

ITERATIVE it is. Whether starting off or salvaging what exists; the design of the strategy and how it is executed must be agile. Either by focusing on specific business problem or question (i.e. sales and opportunities) to test out how the strategy takes form. As the process matures; by measuring and tracking the consumption of measurements and questions; will identify what questions can be weeded out based on a state of inactivity.

Transitioning Strategy to the Tactical

Where should the strategy live? A document or collection of guidelines it is; but it also must be measured against and updated, collaborative, and transparent to all stakeholders. It must be able to collect and capture the required data, mapping the relationship and work flow between goals and questions asked, performance indicators and measurements.

Reconcile, reconcile, and reconcile. An exercise in data discovery; reviewing all available reports, identifying redundancies,asking whether the report is ever consumed, undocumented calculations and measurements, and connecting the dots back to a specific business question or process. Purpose here is to reduce output, bring meaning to the data produced, and make what exists usable.

Constructing a Data Presentation Architecture program going forward. A mix of business intelligence roles and skill sets; it directly contributes to the optimization and success of the BI strategy. Its designed to bring relevance to the data, communicate meaning, and produce actionable results.

Both the growth of data and the need to analyze the information is increasing daily. As well it will be imperative to measure, validate, and adjust with this growing need; to ensure the right data is being applied to the right question.

BI Does Your Organization Good

I assume most have seen either Milk industry’s TV commercial or print ads declaring that “Milk does the body good”. The particular commercial that caught my attention was a family playing football and began to deflate or break apart; point being without milk bones become brittle.

It got me thinking; does the same rule apply to organizations and business intelligence? Do operations become brittle when data isn’t analyzed right; or worse; not used at all. While most organizations are using some form of tool(s) to analyze the various moving parts of the business; response to new data requests is slow, the usage rate of these solutions are low due in part to functional complexity; and much of the time the results are maintained in silos. These factors contribute to the how organizations become brittle to reacting to events, understanding the data, and sharing the information.

Indicee; a SaaS based BI solution has taken to solving this brittleness factor that plagues organizations; by providing a cloud based platform to manage the various sources of data, increasing user interaction, and innovative ways to analyze and understand the data.

Using the content from existing reports and spreadsheets; Indicee gives the user an intuitive self-service workflow to mine and relate data. This mashing up of data from the various applications and spreadsheet-marts inside the organizations strips down the barriers of the otherwise data-silos. This step through do it yourself approach (DIY); lets users to load, identify columns of interest, create measurements, and understand source data relationships; in what I would describe as a guided wizard; with ease and flexibility. This gathering and data learning process can be performed in under 30 minutes; which is about the equivalent time allotment for the daily status meeting in a traditional BI project.

Now that Indicee understands your data; ongoing report updates grow the data mart. That slow response rate to change that typically breaks organizations; Indicee gets the problem; no reload of data here; rather Indicee recognizes the change; asks how it needs to be included to the fold; and this data is now part of your ongoing analysis.

Intuitive question interface

It’s about learning the data right? It’s at this point typically that continued adoption comes in question, and brittleness in the data begins to appear; as users reconcile what was needed versus what is available. Worse is the classic BI user interface made more for creating Visio diagrams then asking business questions; muddied with drag-and-drop functions, and inquiries returned in the form of SQL. Indicee has lived up to the principle; find simplicity in the complex; remove the noise, and let me use the language I speak to ask the question. Described as an “Intelligence Question Interface” has put some UX into the UI; asking the “What” I want to measure, “How” I want to organize my data (date, geographic, product), and “Filter” this from the information requested. This textual process constructs a sentence; that describing what information will be retrieved.

Users can then turn this question into a report, a report into a chart, drill-down into the details, sort, or even alter the original question. Need to monitor a collection of metrics or KPI’s; reports and charts an be presented in a dashboard to track performance across the organization. Need to add this output to a document or presentation; simply export the report to either an Excel or PDF file format; even include the question asked without having to embed a decoder.

Email; ideal for sending out task or meeting requests; for sharing information; inefficient and lost in translation. The purpose of Business Intelligence was to produce for others to consume; share and collaborate on results; for the information to really have value it must have context. Indicee has progressed in this next step of BI evolution; providing a platform where information to be shared among groups, allows make and view comments on reports. Simply put strength in numbers adds strength to your data; reducing the brittleness.


Indicee is new; there are trade-offs; the strength of its platform is about enhancing end-user data visibility; rather than the building on the foundation of bells and whistles that many traditional BI tools offer. My initial impression were that the data functions provided were light; my questions were being answered; and the results were accurate; which matters the most.

The company is developing a strong ecosystem through its VAR channel; which in the long-term should increase adoption and yield deeper value to their customer data. The current approach in offering called “Quickstarts”; a predefined set of data marts for a number of popular accounting packages; is a welcome contrast in the SaaS BI space that is perceived as to Salesforce (SFDC) centric.

Initial sign-up is simple; based on a freemium model; that starts off with a 30 day trial – if you don’t use more than 10 Mb of data, it stays that way. As your data requirements grow, the cost scales: $69/mo for 100mb. Want more users? $149/mo for a workgroup (5 licenses). I would describe the process as more user buy-cycle driven.

Disclosure: I have been paid for past consulting work at Indicee; though no longer a client

A Database By Any Other Name

Creative forces launch innovation; provide alternatives to a market where choices may not be all that dissimilar. This couldn’t be more apparent with this current range of alternatives; BigData, NoSQL, or the analytic appliance (ADBMS); looking to displace the defacto RDBMS. The King is dead…long live the new King or as some call it cyclical history in progress.

It’s not that this current disruption isn’t lacking in technical or business rationale; rather its the misguided approach some have taken to by ruling out the applicability of the traditional dbms . The rdbms space over the past 20+ years has become dominated by the likes of Oracle, Microsoft, and IBM; who have exerted a sense of entitlement in creating a “one size fits all” offering to serve a broad range of applications.

This path to generalized optimization has manifested into bloated functionality; and worse yet into scalability constraints all at the expense of the captive user community. The cumbersome and primitive methods of data access and storage needs that gave way to the mainstream adoption of the rdbms is not that far removed from what is motivating organizations to seek out alternatives today. Though it may be worth a momentary step back; to understand the real problem; so we don’t reconstruct what we are trying to replace.

MySQL gave organizations that first sense of freedom as a low-cost, lightweight alternative that could be supported on commodity hardware platforms. For many this removed the licensing constraints of the enterprise dbms; that inhibited them from addressing growing performance overhead on their transactional systems. Moving this activity to what was being termed a “distributed server farm”; to perform tasks like ad-hoc querying, data mining, and online shopping. Innovative at the time; and compelling to many clients I worked with.

Point of clarification; when I refer to commodity hardware; I’m not talking about clustering together recycled 386 desktops; but moving away from supercomputers that scaled up; and to clusters of physical hardware that can scale out utilizing more elastic methods like virtualization.


The “BigData” problem being bantered about today is not a new problem; many organizations have been working with and analyzing terabytes and petabytes of data for some time. AT&T and Bell labs developed the Daytona project outside the confines of the a traditional dbms to tackle the analysis of their growing volumes of call data.

What this is becoming more about the is the openness and availability of the data; along with the increasing number of sources/devices generating steams of data we will need to make sense of. Moreover; unlike the AT&T; organizations don’t have to build their own hardware or software infrastructure; the cloud combined with open source projects for the most part takes care of this.

Amazon supplies us with a public cloud; that gives us a scalable server infrastructure at a fractional cost, (BigTable) (MapReduce) (Google), Hadoop (Yahoo), Cassandra (Facebook), and Dynamo (Amazon) have made available the solutions used to manage their BigData problem as Open Source managed projects. Commercially; IBM has put its collective effort in working with large data problems; and generated plenty of talk around Bigsheets.

The Hadoop project in particular has garnered the most attention; built on framework of Hadoop file system and MapReduce for data processing; has evolved into an ecosystem that scales with are data storage and processing needs. From HBase for analysis to Hive, and Pig that simplifies data queries; Hadoop is on course to righting the data problem.

To learn and understand the data; the methods used to store and process it is critical. The school bell rang in my ear while I was prototyping statistical models against a number of medium-sized data sets; ranging from 500 GB – 1 TB in size; and contained 50+ columns per row. Added into the mix was R; an open source statical language; to process, data mine, and predict possible outcomes; to get to this point I needed to build a dbms, create data model, load data, aggregate data, extract and flatten data so R could process it…STOP and RESET

Refine the approach; and think about what my initial problem was again; understanding and learning the patterns in my data. I took the problem to the cloud and utiziling the Amazon EMR platform to load, process, and flatten the data. Constructed on the combined framework of Hadoop and MapReduce providing APIs that can be interfaced using Python to perform sort, compare, reduce, map routines, and output of aggregated results that I could then analyze in R. All completed with credit card in hand; for around 10.00 dollars and under 45 minutes of processing time.

I was able to solve my computational problem by prototyping on Amazon EMR in minimal time; and was isolated from many of the complexities and limitations that can exist with an internally managed Hadoop distributed architecture. Being a batch problem; this can exist and grow in the cloud for some time; a point will be reached where from a cost perspective to move to dedicated hardware; I guess that why companies like Cloudera exist to help organizations when that time arrives

My problem isn’t alone being outside the typical web data use case that Hadoop being applied to. BioTech and more specifically the bioinformatic activity is greatly benefiting from this framework to solve their growing data needs.


A movement to make a software engineer’s life easier when dealing with modernized data.

post-relational distributed computing; and as Micheal Stonebraker has described it breaking down the overhead associated to maintaining consistency (ACID) properties of a transaction.

Tossing out a legacy; overused; query language; that adds to the overhead of managing data transactions.

One and two; yes; three; yes and no; having dealt with point-in-time analysis using SQL is a good example of where alternative approaches are required. I’m impressed with how SQLStream is addressing this problem.

There are a number of alternatives being released to solve these data scaling problems; MongoDb (document) , Redis (key), and Neo4j (Graph) are vendors I probably get asked most often about. Each with their advantages, but more importantly bringing an open discussion to a hard problem. Consider how the alternative manages consistency, persistence, and availability of the data. How does that fit into the application requirements? The thinking that its open source and can be customized; will be a full circle trip to the same problems that rdbms would have caused.

I was introduced to the NoSQL term and more so CouchDB while on a data mining project. A case of two projects converging; one analyzing test results; another sharing the test results and supporting documents across a number of labs. The problem spelled out; high transaction volumes; and data structure that didn’t fit into a relational model without frequent altering.

A sought out opinion became a quick introduction to the CouchDb architecture:

Not a rdbms, distributed document database, JSON interface, schema free, maintaining both unstructured/structured data, and data access through a number of open source methods.

I ask:

What are they getting that a traditional dbms can’t provide and what are they giving up.

In an environment with a number of research documents along with 100+ million rows of generated test results that needed to be accessed more frequently; and structures that could change based on the test being executed. Managing and constant updating of a relational data model wasn’t in the plans; nor was the resource to take this on

The CouchDB platform gave them a distributed/replicated data environment to handle requests as well as transactional volume; and while it does preserve the ACID properties; ensuring both data consistency and availability; there was willingness to working an eventually consistent state (BASE) for the performance gain. This wasn’t a dbms replacement; it was a new problem that until now couldn’t be solved easily.

Analytic Database

Disrupting a space that has long been dominated by Teradata; this new breed of DBMSs have set out to change the scale of economies for many organizations needing to perform analysis and proactive learning on their BigData.

Recently it was described as over-crowded; I like the competitive field and find that it produces much more open discussion and innovation for that matter. Leveraging the either scalability of standard commodity hardware or optimized devices for performance advantages and built on some foundation of the PostgresSQL open source dbms engine; Netezza (relational), Greenplum (relational), Vertica (Columnar), Aster Data (Relational), and ParAccel (Columnar) are now the growing forces in the space.

Leading way to a steady stream of product evolution and innovation; massive parallelism, in-memory processing, integration of the Hadoop and MapReduce framework to minimize processing data, deployments of their platform in the cloud, supporting solid-state storage (SSD) leading to significant performance gains in data access, and reducing the data movement bottleneck with hybrid data-application server introduced by Aster Data that embeds processing logic into the database engine.

Much of the focus to date has been on reducing the access and processing time on the stored data; at some point attention will need to shift to figuring out how to efficiently handle the blend of loading large batch files with real-time data streams.

Co-existing going forward

Neither the rdbms nor SQL are going away anytime soon; for the simple fact that these alternatives were not built as replacements for the dbms but rather address new data problems. I wouldn’t recommend pulling the rug from under the dbms managing your organization’s financial or ERP systems; there is plenty of best practices around addressing performance concerns with these applications. Most replacements today is due to standardization to a single dbms platform.

I see the offerings being developed by RethinkDB and Drizzle; that is lightning the MySQL framework to improve scalability in their product; as the model or approach going forward. Functionality in these alternatives will be adopted by the dbms vendors to address shortcomings in their own products. The MapReduce model is now being included in Teradata and I wouldn’t be surprised if Oracle released some integration into the MapReduce/Hadoop framework to meet customer demand. The addition of schema free support to MySQL may have changed the direction of the referenced project above.

Many of the NoSQL vendors are wrapping a SQL like interface around their platform to simplify access; but here how much functional bloat can be taken on; before the platform deviates from its applicable use. Versus getting caught up in the CAP theorem discussions or worse the coolness factor; remember this is about living with trade-offs and labeling data as a commodity.

The Future’s So Bright, I Gotta Wear Shades

For the independent record labels I considered the 80’s to be that combination of the golden age and end of an era. The rising economics made it impossible for them to retain their artists and produce records without partnering and eventually being acquired by the larger label. What made these independents different; closer to the grassroots scene, the willingness to take on the progressive and innovative, lacking the generic feel one got from the likes of EMI or Sony.

A reflection of the past decade and how the theory of “Creative Destruction” has shaped a climate more suited for innovation today; reveals a shiner outlook then bestowed the 80’s independent labels. Joseph Shumpeter; an early 20th century economist; popularized and further developed the theory where the the process of intnnovation and progress destroyed and replaced the old with the new.

From an perspective of measured and record economic indicators these past ten years have been described as the “lost decade” rightly so. Just look at the employment indicator in 1999 estimated 108 million employed (private services) to 2010 where yes its at the same 108 million; due in part to the combined loss of net worth caused by both the dot-com and financial bubble bursts estimated between 14 – 16 trillion; though there are positives to show in between the final numbers are what matters. From an innovation perspective though, we learned and created a collection of more leaner, agile, and sustainable environments keeping future in mind this time; providing a little light in the tunnel.

The Beginning

At the start of the decade the internet had become THE platform and business model; it had disrupted and changed the way people were now searching, doing business, and of course shopping; there was no question on the longevity of this boom. E-commerce was now the buzzword; Amazon and eBay were changing how we buy and extend our outreach to sell; and Paypal was disrupting how we were performing the simple task of commerce on the internet. Even as the Webvans and boo.coms; who collectively took in 400 MM in capital investment in order to innovate; would start to show cracks; there was no looking back or stopping the progress in motion.

These collection of websites were being designed and rolled out on best of breed software platforms from ATG, Vignette, and Broadvision; which were created to ease the development and management efforts required. Weblogic coins the phrase “Application Server”; building a common API layer that allows communication between the database and the application; disrupting not only application deployments on the web; but more so putting the concept of client-server to sleep for good. This progression was all in part to influence of Java in the development space; making the hardware layer transparent at execution; allowing for fewer implementations and displacing and disrupting C/C++ in future application development.

Along with this rise interest and activity of the internet dramatic growth in data generated occurred and the need to act on it in real-time became a nescessity. Much of the efforts to date to analyze this semi-structured data would be considered primitive, time consuming, and no where close to real-time. Personify; a platform I had experience with; would be a pioneer on how we would interact with this deluge of data, relate with our online shopper to understand their behavior, and further insight into why Ryan just left his shopping cart in the middle of the digital aisle and walked out the door. Much of what they did gave way to real-time analytics.

The B2B landscape was partaking in this change as well; the web was becoming the presentation layer of choice, and gave way to efficiencies in business processes. Ariba; innovated and transformed how we did procurement; providing a transparent landscape to work with vendors; wanting to differentiate themselves from ERP and SCM; created the term Operating Resource management (ORM); and started to carve out a market from the leaders; Oracle and SAP. Integration had a new buzzword; EAI; which provided organizations the capability of sharing information from the newly acquired and legacy applications; with minimal customization; from the likes of Tibco, Webmethods, and Seebeyond.

Reality Returns

So what went wrong; a topic that has had its share of talk time; bottom line this innovation came at a cost. The simple economics demonstrated that the costs associated to building out and maintaining this infrastructure (software, hardware, hosting) was not supported by the revenue these start-ups were able to generate. Though the wheel of progression was slowed; it had not stopped; a number of start-ups would cease to exist either from the lack of viability of their business model or a victim of the fallout, investors were left to lick their wounds, but through it all this didn’t deter those to continue forward. Creative destruction; thought to be stalled; was already paving under an industry barely out of infancy; the future outcome being the scrutinizing of costs and validity of business models. The sun would rise the next day; and the future was beginning to look okay.

The Looming Clouds (of change)

Emerging from the rubble; Google was beginning to evolve and disrupt; as described to me as not just a search engine; but as a incubator of people and information; which to date Google has been quite successful at accomplishing, Apple would deliver an iPod and iTunes redefining the user experience, Amazon survived to become the A to Z, a platform for third-party sellers, release the kindle, and provide the technology for the next cog in the wheel of progress; referred to as the cloud. The internet itself was starting a new phase in progression; the rise of social started to take place; with MySpace, YouTube, blogging sites, Facebook, and Twitter. This shift brought with it a more diversified use of the web; a place for community interaction, personal productivity, entertainment, and cost-effective environments where organizations could run their business operations.

Disruption would come in the form of economics; how do we get costs down so we can realize the value; Linux would give us an operating system to work on commodity hardware, MySQL would provide us with a database that was free, minimal footprint, and fast enough on the read, Apache the platform to run the application, and PHP the ability to script the code. Each of these developed separately; bound together to form what would be called LAMP stack. This would make the ability to create applications economically manageable, simple and quick.

The barriers present to cost of entry had started to come down; as did initial investments made to these new breed of startups, all which made for a leaner approach of doing business. With this evolution in technology and the agile approaches progressing with it; one would believe the methods organizations implemented to examine their operational data, better understanding of their customer’s behaviors, manage risk, and measure financial performance, would have been part of this forward progress made. But upon further inspection of how these organizations were using business intelligence to support decision-making activities; the same problems were still being discussed and encountered in 2010 as if it were 2001. The functionality being added by the Business Intelligence vendors during this period of time was considered more style over substance and was not solving the ease-of-use or providing data transparency. To further suppress any possibility of innovation; in the past decade; the market leaders would be digested and integrated into the software portfolios of the Oracles, IBMs, and SAPs.

The perception moving forward was that business intelligence would regress to the point that it would be nothing more than a collection of empty buildings collecting maintenance; couldn’t have been further from the truth. Reminiscent of the same cultivation that propelled change in the earlier part of the decade; newcomers have started to emerge fresh with innovative ideas disrupting how we think and can perform business intelligence activities in these shaky economic conditions.

Employing a proven and now mainstream combination of a SaaS and scalable-cloud model that had been vetted over the past few years by the likes of SFDC, Workday, and Netsuite; these SaaS BI vendors have witnessed a swift acceptance to business intelligence space. These emerging vendors look to distance themselves as replacement for Business Objects, Hyperion, and Cognos by causing disruption in terms of economics and functional use. A current roll call of vendors in the SaaS BI space includes; Birst, Good Data, Indicee, Pivotlink; and while there has been recent fallout of earlier pioneers LucidEra and Blink logic; these existing vendors have used this to their advantage by adapting business models and still delivering innovation. Providing users with a platform that offers a higher level of user experience, the beginnings of a collaborative workspace , ease-of-use, and more user driven self-service options; while lowering the costs, resource requirements, implementation time frame that have plagued the on-premise tools.

The adoption of SaaS as a deployment platform has extended beyond BI to the analytical functions that are aligned closer to the business operations business from sales and supply chain optimization, forecasting, predictive analytics and performance management. Offerings from Right90, Steelwedge, Aha, eVia, Xactly, Adapative Planning, Appian, and Host Analytics are the beginnings of a unified approach to monitor the enterprise with the same advantages of the SaaS BI vendors. What can the combination of these analytical functions in a single ecosystem do in terms of disruption? To paraphrase Ray Wang; an industry analyst; “The biggest benefit of SaaS is not the software; but the collective intelligence of the network. SaaS moves to information brokering” together with a recent blog post from Boris Evelson; an analyst with Forrester Research; discussing the estimated cost of producing a business intelligence report; plenty.

Those vendors who design and deploy from the beginning on cloud scaling architecture and a common platform (PaaS) that permits for open and efficient data integration; in turn change the economics of scales associated with the costs development and support and the on boarding at the same time creating an ecosystem that opens new channels for revenue that traditional business intelligence solutions will most likely never have access to. When aligned; all this could produce most significant disruptor for business intelligence; the introduction of context to the data being analyzed; leaving behind the days of a generated report open for interpretation.

As with any emerging technology; its not without challenges in the adoption into the mainstream; and as bright as the future is for SaaS BI will be; it will come at a price. Technically; there are concerns related to platform stability and security that need to be addressed, operationally the cost of sale needs to be monitored closely along with the pricing strategy, and the concept of Freemium needs to become either a limited or fading trend for these vendors to survive. There most likely will be further fallout as these vendors with similar functionality start to differentiate themselves, those that will survive will need to develop strong ISV relationships to extend value and revenue models, and acquisition looms as the on-premise vendors want to establish themselves in the SaaS space. However; the true measure of success will be how much further these vendors can penetrate the 25% user adoption rate in the existing BI solutions used by organizations today.

Even with these challenges; the climate is much better for these vendors; compared to the dot-com startups of a decade ago; based on the economies of scale alone. The upfront costs in the development together with the ongoing management are much lower for these vendors; allowing for further innovation; which should result in increased value for their users and a revenue channel with further reaches. These factors should translate into a platform that will allow organizations to ask more questions about their data and closing the gap between the event and the actionable decision. Those ripples will be disruptive.

As for my I.R.S records t-shirt or hand puppet; reminders of what were; instead see these vendors leading the way in realizing the goals and intentions written by H.R. Luhn in 1957 on using business intelligence to support decision making activities