University is a privately endowed research university founded in 1865. Ranked
in the top one percent of universities in the world, Cornell is made up of 14
colleges and schools serving roughly 22,000 students.
Jeff Christen, data warehousing manager at Cornell University and adjunct faculty in Information Science, and Chris Stewart, VP and general manager, USA at WhereScape, talked with DATAVERSITY® about how Cornell dealt with the end-of-life for the primary product they used to manage their data warehouse.
The Primary Issue
Cornell was using Cognos Data Manager to transform and merge data into an Oracle Data Warehouse. IBM purchased Data Manager and decided to end support for the product. “Unfortunately, we had millions of lines of code written in Data Manager, so we had to shop around for a replacement,” said Christen. He looked at it as an opportunity to add new functionality so that their data warehouse ran more efficiently.
Christen’s IT team had to confine processing to hours when the university was
closed, so batch processing from financial, PeopleSoft, or student records
couldn’t start warehouse processing until the end of normal operations and had
to be completely finished by 8:00 a.m. when staff arrived as they needed access to the
getting really close. We were frequently bumping into that time,” said
Christen. Because their processing window was so short, errors and issues could
be very disruptive.
“Our old tool
would just log it if there was an issue, but then we couldn’t load the
warehouse, because some network glitch that probably took seconds was enough to
take out our nightly ETL processing,” elaborated Christen.
documentation was also a problem. Stewart said that they joke with their
customers about documenting a data warehouse. “There are two types of
documentation: nonexistent and wrong. People laugh, but nobody ever argues that
point because it’s the thing that people don’t like to do, so it rarely gets
done,” said Stewart.
Because it is an academic
institution, licensing and staffing costs were important factors for Cornell. Stewart often sees this in government and in higher
education organizations where the administration has increasing data needs, yet
the pool of available people is small, like Christen’s staff of four.
Stewart said that
automation can lift much of that workload so staff can get more accomplished in
a shorter amount of time. “You can’t just go out and
add two more people. If you have more work, you need to get more out of your
existing staff,” said Stewart.
Finding a Solution
Christen started to shop around for ETL tools, with an eye to adding some improvements. There were several key areas he focused on when evaluating vendors: documentation, licensing costs, improving performance and being able to work within existing staffing levels. In 2014, Christen attended the Higher Education Data Warehousing conference to research options.
WhereScape was one of the exhibitors at the conference and one of the features that caught his attention was its approach to documentation. “Our customers were used to having outdated and incomplete documentation, and that was something WhereScape definitely had a handle on,” he said.
Most of the products Cornell considered required licensing by CPU, which could prove cost-prohibitive as Cornell’s extensive data warehouse environment was scaled for end-user query performance.
“We have a
ton of CPUs,” Christen said. CPU-based licensing costs would be significant,
and they found themselves trying to figure out how to re-architect the entire
system to reduce the CPU footprint enough so that the licensing could work, a
process that would create other limitations. WhereScape’s license model is a
developer seat license, so with four full-time warehouse developers, they only
needed to purchase four named user licenses.
“There’s no separate license for the CPU run-time environment with WhereScape, so if we’re successful, we’ll get everything converted, but there’s no penalty for how we configure the warehouse for end-user performance or query performance,” Christen said.
Being able to integrate and use the product without increasing the number of developers was a clear advantage. “That’s has been a key driver for organizations evaluating automation for their teams,” Stewart added.
Cornell didn’t just rely on marketing material to make their decision. They did an on-site proof of concept where one of their developers worked with the product on a portion of their primary general ledger model. They discovered that WhereScape was intuitive enough that one of their ETL developers was able to code a parallel environment in the proof of concept with minimal assistance from WhereScape. The developer hadn’t gone through any formal training, which proved that the learning curve would be manageable.
The proof of concept allowed them to get a nearly apples-to-apples comparison, which showed “huge improvements” in load time performance compared to Data Manager. “So, it was a robust enough tool, but also intuitive enough that it could be mastered in a few weeks,” said Christen.
WhereScape helps IT organizations of all sizes leverage automation to design, develop, deploy and operate data infrastructure faster.
long ago that there were patterns in data warehousing that really transcend any
industry vertical or any size of company,” said Stewart.
Because the process of building a data warehouse out is primarily mechanical, and much of that is common among data warehousing organizations, WhereScape automates both the design and modeling of the data warehouse, all the way through to the physical build.
deployments, as you’re moving a project from development to quality assurance
environment (QA), and then on to production, we’re scripting all that out as
well,” said Stewart. These are all processes companies usually use multiple
tools to address – a resource-heavy process that can create a silo for each tool.
“We have one tool suite that covers data warehousing end-to-end and it’s just one set of tools to learn,” said Stewart. Instead of licensing separate tools to for each part of building a data warehouse, then finding a place to install all those tools, and spending weeks for staff training and management – teams have just one tool to learn and use. Handing off the build to WhereScape’s automated process frees up time and energy so that the business can take advantage of that data and produce useful analytics.
wins of the conversion from their traditional ETL tool to WhereScape allowed
Cornell to cut their nightly refresh times in half, or better, in some cases.
Although they didn’t start that way, they are now a 100 percent WhereScape
solution, with 100 percent Amazon-hosting as well.
“We did a
major conversion which took a few years to get to WhereScape from our old tool,
but that’s behind us. We’re running WhereScape on Amazon Web Services in their
Oracle RDS service,” said Christen.
just finished this conversion in the last year, since 2014 when they purchased
WhereScape, all new developments and enhancements have been done in WhereScape.
actually an option to fix the problem, restart it, and still complete before
business hours, which is a big win for our customers,” said Christen. “Essentially,
we’ve cut our refresh times in half, so not only can the team complete all the processing they need with their batch
windows, we’re not brushing up against business hours anymore.”
automatically generating documentation, WhereScape solved the problem of
outdated and incomplete documentation.
To take full
advantage of the automated documentation process, Cornell decided to build in
some new subject areas, but the speed of the tool outstripped their internal
modified waterfall approval process. Christen believes they can speed up their
process now that they can quickly put out a prototype. They can start receiving
feedback immediately from customers within days rather than weeks, and from
there, refine the model until they’re ready for production.
changing our practices now that we have some new abilities with WhereScape,”
said Christen. One of the next steps is to more fully leverage and market the
documentation so they can start providing their customers with more information
about the attributes that are available in the warehouse.
benefit is that Christen’s Business Intelligence Systems students get to use
WhereScape to learn Dimensional Data Modeling, ETL concepts, and Data
Visualization hands-on with real datasets.
teaching the concepts of automation so they learn the hard way, with SQL
statements, and then we use WhereScape and they can see how quickly they can
create these structures to build out real dimensional model data warehouses,”
that they’ve had inquiries from other universities that have heard about
Christen’s use of WhereScape in the classroom and are interested in
incorporating WhereScape into their curriculum, so the students can get more
work done in a semester.
similar benefit to what our customers are receiving in their ‘real-world’
application of automation, and it is giving students the chance to understand
the full data warehousing lifecycle,” said Stewart.
Image used under license from Shutterstock.com
Credit: Source link