Category Archives: Rstats

The Hiatus

Per ipsum sit scientia

When I originally started this blog in 2010 my intent was for this blog was to provide a different perspective into data, analytics, and the technologies evolving around it. Its been over a year since I last posted, not that I didn’t have topics, I drafted over 4 different posts, but I found the topic to be either cluttered by discussion or worse I wasn’t really providing any further insight.

Why the renewal now? First my exposure to a variety of organizations and deciphering the data problems each faced, has yielded a breadth of experience, engaging with or applying a breadth of technologies, both old and new, has given me subject matter worth writing about, and on a personal note I was given a finite period to enjoy the company of a long-time companion who had provided her unconditional support to me, I owed her beach and fetch time.

Observations and lessons learned;

An unwanted side-effect to complexity is failure. “Plan for it” is discussed, even documented, more often then not, its poorly implemented. Designing distributed computing systems is not easy, if anything you’ve introduced new problems to manage, the process of availability and consistency shouldn’t be in the list of new problems. I was taught a long time ago if its not stable, it won’t scale, and then forget adaptable.

Code is now a commodity. Displaced by the consumerization of information, the emphasis shifiting to data re-use. Transportable, and assembled in most efficient technology, code is now this inter-changeable modal, no longer constraining the distribution or consumption of data.

The first month of the NBA/NHL?MLB season is irrelevant to most fans, as is TPC-H to anyone who actually analyzes data.

DataRefs is a more efficient join. Learned plenty with MongoDB

The cost/benefit that RStats + Python + Jruby + Amazon EMR has provided to a customer base is well immense.

Enterprise Software as it functions and behaves today, is a leading indicator that much of it requires a complete overall. Daily these on-premise applications continue to look tired, constraining organizations from growth.

SAP’s completeness of offering the most robust In-line analytical capabilities across line of business application offering, by far leads any other enterprise vendor. Driving efficiency into every part of the Supply Chain Management to Human capital management, complex algoritms to drive forecasting through to optimization, the embedded functional “intelligence”, delivers to business the right process to execute “competitive analytics”. What SAP hasn’t done well, the ability to execute or integrate these capabilities.

You keep using that word. I do not think it means what you think it means.“.

Advanced Analytics. These two words together, look weird, and make no sense. This was one topic I had started to write a long winded blog on, before stuff happened. For the most, its disappeared, for the better, really it just confused customers. Technologies used to execute many of these techniques/methods of statistical analysis, predictive analysis, data mining, and machine learning, has advanced in many ways. To categorize these techniques as advanced, made it sound like organizations were getting more now then in the past.

BigData. The jargon and concrete definitions of “what is” and “what isn’t” ensues. Rather focus on practical use cases for the technologies in the BigData space, and solving issues that current exist with the tool set, we want to tell people their data doesn’t fit the problem. If one is prevented from turning the data into into actionable intelligence in a required period of time (latency) due to the volume, velocity or the structure of the data (multi/poly), then there is a big data problem to deal with. The complexity factor isn’t so much the data, but rather the analytics, calculations, processing that needs to be performed.

The significance to me about BigData technologies is the problems I see that can be solved, problems I’ve been faced with, and growing problems, building innovative markets, and yes its more then a “Social Kitten” tool. A blog post to come on this topic.

Complex, statistically improbable things are by their nature more difficult to explain than simple, statistically probable things. -Richard Dawkins

The real problem in analytics. Its become painfully obvious the disconnect or confusion in the processing and understanding of the data, is the misalignment between analysis and synthesis. The focus on breaking down data into granular parts, identifying the patterns, to quantify and connect these findings into a drawn conclusion (i.e. sales results were down). Its the next step in learning, how do all these parts work together? When combined/brought together what new concept/measure do we realize from it. The technology is there today (machine learning algorithms, map/reduce) taking many data parts/sources, and bring them together, coming up with a new solution/finding completing the decision making cycle.

I got most of what I wanted to mention, I need to reformat the blog, add a roll of more enlightening reads then mine, and share more details of what I’ve learned.