Wednesday, October 7, 2009

Cloud: Budgeting for Uncertain Demand


The Economics of Cloud

Enterprises have been collecting large volumes of sales and customer behavior data through POS and ecommerce systems,
and they are eager to leverage the power of predictive analysis of the data to improve the fundamentals of their business.

Predictive analytics take advantage of the data, building models that help support business decisions.
However, models need the flexibility to change as business requirements change, and the supporting IT
infrastructure needs to change as well. Given the complexity of inter-connected models, the run time to calculate
optimal price for an item can easily jump from 1 min. to 5 mins. Dedicated IT infrastructure, using capital expenditure,
simply cannot scale up with demand.

Cloud computing is the only way out of this conundrum. You start with today’s need,
which is a small set of resources, funded by operational expenses. As the models expand in
complexity, you keep adding incremental resources. You never have to plan for a heavy peak usage,
where the large investment sits idle for the most part, and you never have to worry about running out of capacity in the middle of a vital model run.

Analytics on the Cloud

Forecast Horizon has been taking advantage of Amazon Web Services (AWS) to build a high-performing analytical stack,
which has ZERO fixed cost, and scales on demand.

Storage

Most enterprises are paying big sums to commercial vendors for data storage, and these are dedicated to running production applications.
The cost of replicating a large data set for analysis purposes, using the same commercial tools is prohibitively expensive.
Replicating the same data, on a system like Amazon’s S3 storage, purely for analysis, is very cheap. Since analytical techniques are often run in a batch-mode,
the data remains in passive storage for most of the time, and gets attached to a processing engine
(either open-source RDBMS like MySQL or map-reduce like Hadoop) on an as-needed basis.
AWS is designed to separate storage cost (paid by GB/month) and processing cost(paid by CPU/hour),
so you do not pay for CPU costs when data is in passive storage.

Software

Forecast Horizon uses open-source packages such as R for statistical modeling, and
COIN-OR for complex, non-linear, optimization. While these packages can run outside
the cloud, the fact that they are open-source means that they can be replicated across a large number of machines,
without getting bogged down by licensing issues. At a Fortune 500 customer, Forecast Horizon was running over 1
000 CPU’s in parallel, and a per-CPU licensing would have rendered the exercise prohibitively expensive.

Compute Power

This is where AWS really shines for retail analytics. Take for instance the problem of price-optimization,
where running a moderately complex algorithm for a product can easily take one minute. A mid-sized retailer
carries 10K products, which means that a price-optimization batch run takes 10,000 minutes on a single CPU, in a serial mode.
This is clearly not an acceptable solution from a business stand-point, because by the time the model has produced an answer,
it is already obsolete due to the changing business condition (sales, inventory, demand, etc.)

Using AWS, Forecast Horizon can split the same problem across 100 servers running in parallel,
to reduce the problem to 100 minutes, or across 1000 servers running in parallel, to reduce the problem to 10 minutes.
Forecast Horizon does not have to own any of the servers, and only pays for the CPU hours consumed. Forecast Horizon
also does not have to worry about the underlying architecture that makes it possible to spin up 10, 100 or 1000 servers with a single API call.

Summary

Despite a lot of hype, misconceptions, and derisions, the AWS cloud is a ground-breaking tool for
complex analysis of large volumes of data. S3 offers cheap storage, EC2 instances offers CPU on-demand,
scaling easily from 1 to 10 to 1000. While it may not make sense for enterprises to overnight shift their
payroll and account receivables to the cloud, but it offers them the opportunity to ask questions that they
did not imagine could be answered.

Sunday, March 15, 2009

The art and science of retail

Apparel retail is a fine blend of art and science. The art of merchandising is what makes this business fresh and exciting, something that is often lost on the inner geek of yours truly. Last week, a senior retail executive noticed my eyes glaze when discussing knits and wovens. She pointed to my shirt, and said, "That's a knit, and you hang them to display"; and she pointed at my sweater and said, "That's a woven, you fold them to display". We had a good laugh, and I realized that even after spending close to a decade with retail data, there are obvious blind-spots in my understanding of the art.

Numbers and data have been the territory of the retail science. But perhaps data visualization, which I believe is as an art form in itself, can bring the science closer to the art. Here is an example of a simple dashboard, created by the fantastic sparklines package by Gareth Watts. There is very little javascript that you have to hand-code: just pass values to this jquery plugin, and you end up with a useful dashboard.

This is where the science of retail can truly sparkle (pun intended). Retail, for that matter, any business, starts with a plan. Success is often defined by how closely you can deliver on the plan. Bullet graphs are very useful in this context. One of the metrics that we measure closely for our clients is the planned versus actual sell through. You do not want to be too far off the plan. If you are beating plan handily, it maybe good for your profit margins, but you may start to end up with empty shelves, because your vendors will not be able to supply fast enough. If you fall too far behind plan, your cash gets tied up in aging inventory, and your business can choke. Forecast Horizon optimization works by constantly updating recommendations that bring you back to plan.

Tuesday, March 10, 2009

A hammer won't fix your woes

Forecast Horizon is an optimization solutions framework. In other words, unlike a typical optimization software application, Forecast Horizon is geared more towards configurability and customization. Traditional optimization models answer a specific question based on a rigid set of inputs. The model is specialized because it is meant for a specific purpose – like a hammer. Forecast Horizon is adaptable and can be adapted to the needs and complexities of your business – like some wood and a chunk of steel. You could make a hammer, or any number of other things with the wood and steel.

Of course, while chunks of wood and steel are more “configurable” than a hammer, they aren’t terribly useful because few people have the specialized knowledge to work with such raw materials. Forecast Horizon’s purpose is to sit in the sweet spot between the two ends of the scale, and create a sort of “builder’s kit” made up of pre-designed components that can be used as-is or can be extensively reconfigured to suit your needs. Its design provides incredible flexibility while still allowing people who aren’t scientists to understand the science and make powerful decisions.

Tuesday, February 24, 2009

Rails Deployment Blues

While developing in Rails is a joy, deployment can still be an adventure. Very interesting post on the Heroku blog on this topic. Here is a short list of hiccups that I experienced, that can be avoided by you, gentle reader:
  • Slicehost is great, and a $20 slice is a good way to get started. I went with Ubuntu 8.10 (intrepid).
  • Start with the following steps to secure and set up
  • Figuring out what to install from source versus package is tough. After many false starts, it seems like the only thing you need to install from source is Ruby Gems. DHH notes this in his plug for phusion. And he is right. Despite reports to the contrary, there are no issues with Ruby 1.8.7 that comes from the Ubuntu package. MySQL and Apache worked fine from the package as well.
  • Once you have Ruby 1.8.7 and Gem 1.3.1, Phusion should install without problem.
  • Ubuntu oddity: MySQL does not put mysql.sock under /tmp, as ROR expects. It puts a mysqld.sock under mysqld. Thank you Herval. Also, very cool Phil Ochs lyrics on your page: one of my favorite songs.
  • If using Capistrano, make sure you have a non-gui svn client on your local machine.
  • Deployment oddity: script/process/reaper was showing up as non-executable. I am still not sure why this was happening, but Topfunky had a good solution. I am really starting to see the power of Capistrano: you can bundle all these server side commands into a single file that runs from your local machine.
  • Even bigger deployment oddity: When finally everything is almost working, something remains amiss. I can start the app with Webrick, but not with Phusion. One last change: config/environment.rb must be owned by www-data. Again, put it in a Capistrano block, but I have no idea why this is happening.