Excel is a great tool for quick, ad hoc analysis of data. However, business users often embed entire end-to-end processes in Excel because of the ease of making instant changes to the data, calculations, and outputs. The usual issue in replacing Excel-driven processes is that business requirements constantly change and users need to represent those changes within a few hours, not the few weeks it can take for delivery by an engineering team.
A standard solution proposed for reducing reliance on Excel is to replace Excel VBA with Python. However, without a fundamental rethink of the process flows – using for example the pandas library instead of Excel formulae for data analysis – the only real change is the programming syntax. A Python solution (e.g. from the Openpyxl library) would still use Excel’s object library (via COM) to produce automations within Excel; and the processing would still occur within Excel, such as refreshing pivot tables, recalculating formulae etc. Python scripts can be version controlled in a Git; however this can also be true of Excel add-ins written in VBA, using Git hooks. It is even possible to use VSCode as an IDE for VBA projects, rather than using Excel’s embedded VB Editor. Without a rethink of the overall process, a change of programming language is not a strategic solution to the underlying data control issues.
A strategic solution would include:
- All data persisted in a database, where it is better controlled, more secure, and can be analysed more easily;
- Data flows clearly defined and configurable;
- Transparent, reviewable, and version-controlled business logic;
- Reporting represented in an interactive data visualisation application, such as Power BI.
I have not yet encountered an off-the-shelf solution that solves the spreadsheets problem; and I think that such a product and supporting services would have enormous demand.
A Business Logic application between a data warehouse and Excel could visually represent the business logic, enabling non-technical users to:
- Define and apply secondary calculations;
- Join and group data (with controls in place to avoid dropped and duplicated rows);
- Filter and configure output datasets;
- Order the process flows sequentially.
An Excel add-in could call the resulting reports to Excel for review and analysis. The add-in could enable ad hoc data (resulting from the review) to be uploaded into the Business Logic app, such as adjustments and new types of reference data. All ad hoc data changes would then be tracked and made transparent, rather than obscured in the logic of an Excel workbook.
Over time, machine learning would have a greater role in suggesting and optimising the business logic.