blue print for analytical excellence

Wednesday, October 7, 2009

Cloud: Budgeting for Uncertain Demand

The Economics of Cloud

Enterprises have been collecting large volumes of sales and customer behavior data through POS and ecommerce systems,
and they are eager to leverage the power of predictive analysis of the data to improve the fundamentals of their business.

Predictive analytics take advantage of the data, building models that help support business decisions.
However, models need the flexibility to change as business requirements change, and the supporting IT
infrastructure needs to change as well. Given the complexity of inter-connected models, the run time to calculate
optimal price for an item can easily jump from 1 min. to 5 mins. Dedicated IT infrastructure, using capital expenditure,
simply cannot scale up with demand.

Cloud computing is the only way out of this conundrum. You start with today’s need,
which is a small set of resources, funded by operational expenses. As the models expand in
complexity, you keep adding incremental resources. You never have to plan for a heavy peak usage,
where the large investment sits idle for the most part, and you never have to worry about running out of capacity in the middle of a vital model run.

Analytics on the Cloud

Forecast Horizon has been taking advantage of Amazon Web Services (AWS) to build a high-performing analytical stack,
which has ZERO fixed cost, and scales on demand.

Storage

Most enterprises are paying big sums to commercial vendors for data storage, and these are dedicated to running production applications.
The cost of replicating a large data set for analysis purposes, using the same commercial tools is prohibitively expensive.
Replicating the same data, on a system like Amazon’s S3 storage, purely for analysis, is very cheap. Since analytical techniques are often run in a batch-mode,
the data remains in passive storage for most of the time, and gets attached to a processing engine
(either open-source RDBMS like MySQL or map-reduce like Hadoop) on an as-needed basis.
AWS is designed to separate storage cost (paid by GB/month) and processing cost(paid by CPU/hour),
so you do not pay for CPU costs when data is in passive storage.

Software

Forecast Horizon uses open-source packages such as R for statistical modeling, and
COIN-OR for complex, non-linear, optimization. While these packages can run outside
the cloud, the fact that they are open-source means that they can be replicated across a large number of machines,
without getting bogged down by licensing issues. At a Fortune 500 customer, Forecast Horizon was running over 1
000 CPU’s in parallel, and a per-CPU licensing would have rendered the exercise prohibitively expensive.

Compute Power

This is where AWS really shines for retail analytics. Take for instance the problem of price-optimization,
where running a moderately complex algorithm for a product can easily take one minute. A mid-sized retailer
carries 10K products, which means that a price-optimization batch run takes 10,000 minutes on a single CPU, in a serial mode.
This is clearly not an acceptable solution from a business stand-point, because by the time the model has produced an answer,
it is already obsolete due to the changing business condition (sales, inventory, demand, etc.)

Using AWS, Forecast Horizon can split the same problem across 100 servers running in parallel,
to reduce the problem to 100 minutes, or across 1000 servers running in parallel, to reduce the problem to 10 minutes.
Forecast Horizon does not have to own any of the servers, and only pays for the CPU hours consumed. Forecast Horizon
also does not have to worry about the underlying architecture that makes it possible to spin up 10, 100 or 1000 servers with a single API call.

Summary

Despite a lot of hype, misconceptions, and derisions, the AWS cloud is a ground-breaking tool for
complex analysis of large volumes of data. S3 offers cheap storage, EC2 instances offers CPU on-demand,
scaling easily from 1 to 10 to 1000. While it may not make sense for enterprises to overnight shift their
payroll and account receivables to the cloud, but it offers them the opportunity to ask questions that they
did not imagine could be answered.

Sunday, March 15, 2009

The art and science of retail

Apparel retail is a fine blend of art and science. The art of merchandising is what makes this business fresh and exciting, something that is often lost on the inner geek of yours truly. Last week, a senior retail executive noticed my eyes glaze when discussing knits and wovens. She pointed to my shirt, and said, "That's a knit, and you hang them to display"; and she pointed at my sweater and said, "That's a woven, you fold them to display". We had a good laugh, and I realized that even after spending close to a decade with retail data, there are obvious blind-spots in my understanding of the art.

Numbers and data have been the territory of the retail science. But perhaps data visualization, which I believe is as an art form in itself, can bring the science closer to the art. Here is an example of a simple dashboard, created by the fantastic sparklines package by Gareth Watts. There is very little javascript that you have to hand-code: just pass values to this jquery plugin, and you end up with a useful dashboard.

This is where the science of retail can truly sparkle (pun intended). Retail, for that matter, any business, starts with a plan. Success is often defined by how closely you can deliver on the plan. Bullet graphs are very useful in this context. One of the metrics that we measure closely for our clients is the planned versus actual sell through. You do not want to be too far off the plan. If you are beating plan handily, it maybe good for your profit margins, but you may start to end up with empty shelves, because your vendors will not be able to supply fast enough. If you fall too far behind plan, your cash gets tied up in aging inventory, and your business can choke. Forecast Horizon optimization works by constantly updating recommendations that bring you back to plan.

Tuesday, March 10, 2009

A hammer won't fix your woes

Forecast Horizon is an optimization solutions framework. In other words, unlike a typical optimization software application, Forecast Horizon is geared more towards configurability and customization. Traditional optimization models answer a specific question based on a rigid set of inputs. The model is specialized because it is meant for a specific purpose – like a hammer. Forecast Horizon is adaptable and can be adapted to the needs and complexities of your business – like some wood and a chunk of steel. You could make a hammer, or any number of other things with the wood and steel.

Of course, while chunks of wood and steel are more “configurable” than a hammer, they aren’t terribly useful because few people have the specialized knowledge to work with such raw materials. Forecast Horizon’s purpose is to sit in the sweet spot between the two ends of the scale, and create a sort of “builder’s kit” made up of pre-designed components that can be used as-is or can be extensively reconfigured to suit your needs. Its design provides incredible flexibility while still allowing people who aren’t scientists to understand the science and make powerful decisions.

Tuesday, February 24, 2009

Rails Deployment Blues

While developing in Rails is a joy, deployment can still be an adventure. Very interesting post on the Heroku blog on this topic. Here is a short list of hiccups that I experienced, that can be avoided by you, gentle reader:

Slicehost is great, and a $20 slice is a good way to get started. I went with Ubuntu 8.10 (intrepid).
Start with the following steps to secure and set up
Figuring out what to install from source versus package is tough. After many false starts, it seems like the only thing you need to install from source is Ruby Gems. DHH notes this in his plug for phusion. And he is right. Despite reports to the contrary, there are no issues with Ruby 1.8.7 that comes from the Ubuntu package. MySQL and Apache worked fine from the package as well.
Once you have Ruby 1.8.7 and Gem 1.3.1, Phusion should install without problem.
Ubuntu oddity: MySQL does not put mysql.sock under /tmp, as ROR expects. It puts a mysqld.sock under mysqld. Thank you Herval. Also, very cool Phil Ochs lyrics on your page: one of my favorite songs.
If using Capistrano, make sure you have a non-gui svn client on your local machine.
Deployment oddity: script/process/reaper was showing up as non-executable. I am still not sure why this was happening, but Topfunky had a good solution. I am really starting to see the power of Capistrano: you can bundle all these server side commands into a single file that runs from your local machine.
Even bigger deployment oddity: When finally everything is almost working, something remains amiss. I can start the app with Webrick, but not with Phusion. One last change: config/environment.rb must be owned by www-data. Again, put it in a Capistrano block, but I have no idea why this is happening.

Wednesday, November 19, 2008

Restful Nested Resources in Rails 2.0

There are several tutorials around for Rails 2.0 and nested resources (Akita, Adam Heroku). I wish I could get a hold of Adam's code files, but I keep getting 404 not found).

I struggled a lot and finally figured out how to do this right. I have been keeping a log of my own coding to share with my team, and these are my logging notes. Email me if you want the actual files, and I will be happy to share.

In my case, I have a parent 'user', that can own multiple children 'categories'.

Steps:
1> I use straight up scaffolding to generate the models, controllers, and migrations for User and Category. BTW, only later, reading Akita, did I discover the option of declaring "t.references :user" for the 'categories' migration. Interesting tid-bit that I did not know.
2> For the routes.rb, following Adam's path, I do not create a seperate category resource. I use the simple has_many declaration to denote that categories are children of user. Note that I am using the acts_as_authenticated plugin, and this had to come after all the other 'user' related declarations in the routes file.
map.resources :users, :has_many => :categories
3> In the User model, declare has_many :categories, and in Category model, declare belongs_to :user
4> Following Adam's instructions, type 'rake/routes', and let your eyes bleed over the keyboard. Try to look for the routes that give you the 'user_categories' like paths.
5> After this, Adam's examples become a little hard to follow, and I will try to spell out the next steps in gory detail. Hopefully, if will save some newbies like me a couple of hours.

6> Changes in the categories_controller.rb
6a> Start at the very bottom: Create a private method that loads the user from params:
6aa> private def load_user @usr = User.find(params[:user_id]) end
6ab> Then use this in a before_filter :load_user at the very top of the controller.
6b> For 'index' method, modify line 7: @categories = @user.categories.find(:all)
6c> No change in 'show'
6d> For new, @category = Category.new(:user_id => @user.id). Not sure this is absolutely necessary.
6e> No change in 'edit'
6f> For 'create'. @category = @user.categories.build(params[:category]). Note the new method 'build', working on the @user object.
6ff> The 'redirect' changes to format.html { redirect_to ([@user, @category]) }. Note that it is passing an array.
6g> In update, same 'redirect' change, format.html { redirect_to ([@user, @category]) }
6h> In 'destroy', redirect change, { redirect_to(user_categories_url(@user)) }

7> Changes in the 'index' view: Note how 'user' is prefixed to all 'category_path'
7a> 'Show' link: link_to 'Show', user_category_path(category.user, category)
7b> 'Edit' link: link_to 'Edit', edit_user_category_path(category.user, category)
7c> 'Delete' link: link_to 'Delete', user_category_path(category.user, category), :confirm => 'Are you sure?', :method => :delete

8> Changes in the 'new' view:
8a> form_for([@user, @category]) do f: Passing an array of the @user and @category objects
8b> Link 'back' to index: link_to 'Back', user_categories_path

9> Changes in the 'edit' view:
9a> form_for([@user, @category]) : Again, passing two objects in the form
9b> 'show' link needs 'user' in the path, as well as the two objects: link_to 'Show', user_category_path(@user, @category)
9c> 'back' link simply needs to put the 'user' prefix: link_to 'Back', user_categories_path

10> Changes in the 'show' view: Only @category object needed here. Changes are in the links
10a> 'edit' link: edit_user_category_path(@category.user, @category)
10b> 'back' link simply needs to put the 'user' prefix: link_to 'Back', user_categories_path

This is a lot of typing, and manual work. There should be a plugin for this, and sure enough, there is one (http://deaddeadgood.com/2008/10/8/scaffolding-nested-resources-in-rails). However, I am a Windows user, and have not yet discovered the joys of Git. That automatically puts you in Rails purgatory. I believe I am the only PC user who comes to our local ruby meetup. However, there is hope (http://jamie.ideasasylum.com/2008/08/installing-rails-plugins-with-git-on-windows/). Haven't tried this one out yet. I do think that it is not fair to impose this much pain on Windows users to just install plugins.

Thursday, July 24, 2008

Fusion Charts with Ruby on Rails

Seasonality is an important component of forecasting, and thanks to Google Trends, we can quickly get multi-year seasonality for a category such as 'Turtlenecks'. After downloading the CSV file, I wanted to reproduce the chart from the data on our web-page.

Fusion Charts is a fantastic charting package, although so far we can only afford the free version. I will show a particular trick of extending Fusion Charts Free to manipulate the X-axis label. The time series from Google Trends had 234 weeks of data, which meant 234 labels. These ran into each other, creating an illegible axis. Also, the default mode for Fusion Charts is to display the values of the data points. With a lot of data points, these crowd the graph. The results, as seen here, are a far cry from the beautiful graphs of Google Trends.

A little bit of Ruby goes a long way to quickly fix this problem. First of all, I need to tell Fusion Charts to suppress the value on the graph. This is done by setting the showValues parameter to 0 in the xml header. I am using the excellent Builder library, where I can pass this as a key-value pair :showValues => 0. Next, I want to show the week for every 13th week. The paid version of Fusion Charts has a parameter setting for this. But for the cheapskates of the world, Ruby to the rescue again. The trick is to set the showName parameter to 0 in every thirteenth line of the xml. Finally, let's rotate the labels so they are vertically oriented, reducing the overlap. This is done by passing another key-value pair to the xml head, :rotateNames => 1. The final code to generate the xml is posted below. The new graph, after the changes, is much cleaner.

counter = 0

xml.graph (:showValues => 0, :rotateNames=> 1) do

@trend_weeks.each do tweek

counter = counter + 1

if counter.modulo(13) == 0 xml.set :name => tweek.week_date, :value => tweek.index

else

xml.set :name => tweek.week_date, :value => tweek.index, :showName => 0

end

I am still new to Ruby, and am sure there is a more idiomatic way of doing this. Please post it in the comments.

Also, it took me a very long time to get Fusion Charts to work the first time in Rails. It was mostly little mistakes that I kept making, as well as an incomplete understanding of how routes worked in the 2.0 world. I may get along to posting a full tutorial on this subject. If somebody needs to get started quickly, please contact me and I can try and help you. Happy Charting!

Saturday, July 19, 2008

The Evolution of Analytics

Enterprises today are awash in data. POS systems capture every transaction for a retailer, CRM systems maintain rich profiles of customer preferences and ERP systems maintain transactional information at the most granular levels. Until the mid-nineties, only very large companies, the ones who could afford SAP or Oracle, could afford a reliable data infrastructure. However, even the large companies had to rely on smaller, niche players for analytical and optimization capabilities. The late nineties saw a number of companies starting to offer analytical products around supply chain optimization (i2, Manugistics). Soon, companies followed with Price Optimization, based on advanced scientific algorithms (ProfitLogic, DemandTec, etc.). Analytical evolution had come a long way: with Oracle, SAP, Teradata providing the data infrastructure, reporting vendors (Cognos, MicroStrategy) providing ad-hoc access to information and knowledge, and specialized vendors providing prescriptive modeling and optimization capabilities.

What has changed in the last couple of years is that high level data infrastructure is no longer limited to large companies. The giants like SAP are now catering to small and medium businesses (SMBs) with simpler offerings, and players like NetSuite are now providing extremely robust ERP systems as a hosted service. While data infrastructure is now available to the masses, science-based analytical software is still very expensive to build and distribute, and mostly beyond the SMB's price-range. SMBs are still lagging in the analytical evolution, although the appetite for analytical excellence is definitely palpable.

A combination of factors is changing this picture, giving SMBs access to high quality analytics at a price point that they can afford. The biggest contribution comes from the open-source community, in the form of relational databases (MySQL, PostGres), web-development frameworks (Ruby on Rails, Django), and optimization programs and solvers (COIN-OR, Scipy). Ruby on Rails makes it possible to go from a business meeting to a working application in a matter of weeks, not months or years. Amazon Web Services provides the physical infrastructure which allows running the complex algorithms on their compute clouds for less than a cup of coffee at Starbucks. As these innovations are driving the costs down, and enhancing the supply of high-quality analytics, the demand for these services are increasing as well. Enterprises large and small are becoming increasingly aware of the power of data and data-driven models, and seeking to gain competitive edge through analytics (Competing on Analytics).

The future centers of analytical excellence may be quite different from what we are used to today. Today's paradigm is niche analytical companies selling to large enterprises. The sales cycle is long and complex, licenses and implementation costs run in six figures, and hosting and maintenance are just as expensive. The future may be ruled by small, nimble shops, providing high-quality, low-cost intelligence over the web, in a simple Do It Yourself (DIY) mode. Are you ready for the brave new world of analytics?