Wednesday, July 23, 2008

Working with Excel data without working with Excel

I am currently working on an application to replace manual MS Excel files that get emailed around and while the business users are looking forward to an interactive web app they are also wedded to their excel files. One of the key requirements we need to meet is allowing them to move their data back and forth.

In the past I've worked with COM and Java libraries to manipulate the files in their native formats and both were painful experiences I did not want to relive. I'm going to talk about how we solved the problem without the need to work with any Office libraries.

Loading Excel data into our Application
The approach we took was to let the users extract the data themselves with a simple copy-and-paste. The user would copy the cells of the excel file and then paste that tab delimited plain text into an HTML form.

Let's say we're trying to import a list of people with a first and last name. The Excel sheet would look like this


We start with a form with a big text area and submit button and an area to show any errors we may encounter while parsing

<% form_tag do -%>
Paste your Excel data : <%= text_area_tag :data, @data %>
Click to validate the data <%= submit_tag 'Validate' %>

<ul>
<% @errors.each do |error| %>
<li><span ><%=error%></span></li>

<% end unless @errors.blank? -%>
</ul>
Then the controller action which let's the model parse the text and only saves if every person model is valid.
class PeopleController <  ApplicationController
def import
@errors = []
people = Person.all_from_tab_delimited_text(params[:data])

people.each do |person|
person.errors.each { |error| @errors << "#{error} on #{person.inspect}"} unless person.valid?
end
end
end
Now we move onto the Person model where the parsing actually takes place
class
class Person < ActiveRecord::Base
def Person.all_from_tab_delimited_text(data)
people = []
CSV::Reader.parse(@data, "\t") do |row|
person = Person.new(:first_name"> row[0], :last_name => row[1])
people << person
end
people
end
end
We now have a working allbeit simple implementation that even does some simple error checking to give the user feedback if their data cannot be imported. Of course the example here is simple and the real system has additional features and complexity that I didn't show here. Some of these are
  • The import into a 2-step operation where we first ask the user to click 'Validate' then return a table showing the data as we parsed it. This allows the user to review and verify it was parsed correctly (i.e. the columns were in the correct order). Only on this second page do we provide an 'import' button that actually persists the new data.
  • The data can not only create new people but also update existing ones. The example above with just first and last name is contrived so this wouldn't make sense but imagine an example where there are many other columns like address, phone number, etc. Our model code is a bit more complex in that it first tries to find a person with the same first and last names. If it succeeds it will then update that person and only creates a new one if there is no match.
  • Some of the additional columns in the real system refer to other models in our system. To deal with these our model code must find other kinds of model objects and pass those into the Person.new. Ensuring the proper errors are sent back to the user in this case is fairly tricky as they may not always show up as part of the ActiveRecord validations.
Downloading data from our app into Excel
Moving data in this direction was even easier as we just relied on the fact that Excel will open csv files as long as our Rails app returns an Excel MIME type. We add a line to register the mime type with the csv extension
# config/initializers/mime_types.rb
Mime::Type.register 'application/vnd.ms-excel', :csv
Then we had to do for this was add a respond_to.xsl in the appropriate places of our controller
class PeopleController < ApplicationController
def index
people = Person.all
respond_to do |format|
format.html # index.html.erb
format.csv do
render :text => @people.collect { |person| person.to_csv}.join("\n")
end
end
end
and then add a to_csv to the model
class Person < ActiveRecord::Base
def to_csv
"#{first_name}\t#{last_name}"
end
end

This only works for very simple Excel data on a single worksheet but when that's true it makes your job of working with that data so much simpler!

Sunday, July 13, 2008

How to unit test a Java class with static initializers

We have a large Java codebase that we're trying to put under test. Since this class was not designed for testability we often run into code that cannot be tested as is. Most often this involves code that assumes it will always be run inside our J2EE Application Server because it depends on classes provided by the server.

I was recently working with someone on this Java class
class Person {
static {
Logger logger = new ContainerLogger();
}

public String doSomething() {
...
logger.info("I did something");
...
}
}
We couldn't create a unit test for the doSomething method because the ContainerLogger class was provided by our application server and couldn't be used outside the container. We refactored to come up with this solution.
class MyClass {
public static Logger _logger;
public Logger logger() {
if (_logger == null) {
logger = new ContainerLogger();
}
return _logger;
}

public String doSomething() {
...
logger().info("I did something");
...
}
}
We refactored our code to access the logger using the accessor method logger() then we also made the logger variable itself public at the same time. Why did we do those two seemingly contradictory things? The accessor method implements the singleton pattern preserving the semantics of the static initializer and the public access to the instance variable _logger allows us to replace the implementation with a mock in our test.
public class TestMyClass extends TestCase {
public void testSomething() {
MyClass._logger = new MockLogger();
assert(MyClass.doSomething(), "expected return");
}
}
I've seen too much legacy Java code that assumes it will always run inside a container but with testing we need to change our mindsets because when running a test we will be outside the container. All dependencies on the container (and in most applications I've seen we don't need that many!) should be encapsulated and able to be mocked through dependency injection. If you do this you'll end up with simpler code that better follows the Single Responsibility Principle

Friday, July 4, 2008

Some things are inherently complicated and slow ... put them in the right place

One of the first lessons I learned when I started working as a software developer professionally was that you don't always have to make an action fast, sometimes its just enough to make it seem fast to a user. This lesson was learned in the 1990s in the context of a Windows application. What we did was quickly draw part of the screen (or a splash image) immediately then do the time consuming work in the background so that by the time the user was ready to interact with the application we'd be ready for them.

Recently I was reminded of this lesson on a website I'm working on. We have a complicated report to show on the user's homepage. My first implementation involved generating the report in real-time when the page is loaded but this was very slooooow. Some profiling and analysis let us make it somewhat faster but not fast enough.

I was stumped until I remembered my old lesson. If it wasn't possible to generate the report quickly, perhaps we could do it sometime when the user wouldn't mind.

Luckily this is a Rails application and ActiveRecord Callbacks make it easy to do this. I could pregenerate the report and update a portion of it each time something is saved. Users expect a save to take some time and don't do it that often. Then generating the homepage becomes just a simple matter of displaying the existing rows.

First I setup my models to call into the report generator every time they change

class Person < ActiveRecord::Base
after_destroy {|person| ReportGenerator::Person.destroyed(person)}
after_create {|person| ReportGenerator::Person.created(person) }
after_update {|person| ReportGenerator::Person.updated(person) }
end


Then I implement the complicated (and slow) logic to pre-compute the rows in the report in a library class


module ReportGenerator
class Executive
def destroyed(executive)
#figure out which rows to delete from the report and persist with the Report model
end
def created(executive)
#figure out which rows to add to the report and persist with the Report model
end
def updated(executive)
#figure out which rows to update in the report and persist with the Report model
end
end
#Similar classes corresponding to the other models that trigger recalculations would go here
end


Finally I can show the report on the homepage with some boring (and fast) Rails code

Model:

class Report < ActiveRecord::Base
end


Controller:

class ReportsController < ApplicationController
@report =" ReportRows.find_by_user(current_user)">


and the View:


<% for row in @report %>
Display the row
<% end %>


The interesting insight for me is that when optimizing there are sometimes hard problems that can't be solved. Its important not to lose sight of the goal you're aiming at (a satisfying user experience) and that sometimes involves spending the computational time somewhere where the user won't mind.