Data Conversion with Python: Conversion at the Speed of Innovation

Created on September 20, 2016
Last updated on December 14th, 2021 at 8:26 am by James Kilpatrick


Right now, as you read these words, a company somewhere is preparing for go-live on a new ERP implementation. No matter the size of the company, the preparedness of the project team, or the quality of the solution, the days leading up to go-live are fraught with stress and anxiety. As I write this blog, my phone has been buzzing with emails concerning an imminent go-live at a company that we’ll call Company J. Let’s say they manufacture knockoff Jordache jeans and Members Only jackets, with legacy systems from the same era.

I’ve touched on the story of “Company J” before. Last we left them, they had taken ownership of a tool I developed in Python that automates data validation. They are now close to go-live and have already converted most of their master data to their new SAP solution. Every object they’ve loaded into SAP has been validated using Python. Validations that took multiple people as long as two weeks to complete with Excel (obviously not an acceptable state of affairs in a fast-paced go-live) are now completed in hours or minutes thanks to Python. The tool outputs consistent, easy-to-read data quality reports – much better than getting a verbal “it looks fine” from a business analyst who passed out at their desk at 2 AM the night before. But I still get asked, why Python?

Why Python, Indeed?

No matter who you ask, Python is in the top 5 most popular coding languages in the world. The IEEE ranks Python 3rd, and a mere tenth of a percentage point behind Java[fusion_builder_container hundred_percent=”yes” overflow=”visible”][fusion_builder_row][fusion_builder_column type=”1_1″ background_position=”left top” background_color=”” border_size=”” border_color=”” border_style=”solid” spacing=”yes” background_image=”” background_repeat=”no-repeat” padding=”” margin_top=”0px” margin_bottom=”0px” class=”” id=”” animation_type=”” animation_speed=”0.3″ animation_direction=”left” hide_on_mobile=”no” center_content=”no” min_height=”none”][1]. The PYPL index ranks it 2nd, and Python has been trading places with C# for 4th in the TIOBE ratings for more than a year now. Python is used by Google, Yahoo, and NASA; it powers the websites of Instagram, Pinterest, and The New York Times[2]. Much of this ubiquity comes from the fact that Python is very easy to learn – a quick foray into Codecademy has any learner writing and running Python code within seconds – and yet it is still very powerful.

Python’s features make unleashing its power very easy. One of the defining characteristics of Python is represented in its tagline: “Batteries included”. This means that Python includes in its standard distribution pre-written code that will complete many common tasks, greatly shortening development times and making adherence to best practices the path of least resistance. The need to support more specialized tasks is met by the rich and versatile library of free, open-source 3rd party packages – in my previous blog, I refer to one such package that adds tools for high-performance data manipulation.

Python is also known for its emphasis on readable code. Many python idioms exist for the sole purpose of making code legible – and legible code is code that can be changed and maintained. Compared to many other languages – especially the languages commonly encountered on ERP projects – Python provides for faster development and better code performance. When every minute counts, leveraging rapid development as enabled by Python is crucial to success.

Popularity Comes from People

Another major argument for Python: it’s an increasingly common subject in undergraduate curricula and an accordingly common skill in the workforce. Python is now the most popular language taught in introductory-level college classes[3]. Even MIT has made the switch to Python. But it’s not just computer science majors learning Python. Forward-thinking universities have recognized that all their alumni must have coding skills to be competitive in the workforce, no matter their major. For example, Georgia Tech now requires every student to take an introductory course in Computer Science – typically one that teaches Python.

Python’s popularity means that hiring a Python developer is typically significantly cheaper than hiring an ABAP developer. There are more people in the workforce who know Python than ABAP and they are usually earlier in their careers. In fact, Python has become so popular that some managers I’ve spoken with have found that – to their surprise – a number of their Business Analysts already know or are familiar with Python.

The Gotchas – and How to Avoid Them

Python, like any other technology, has its limitations and pitfalls. One common critique of Python is that it’s difficult to package Python code into a single executable. There are solutions to this problem (for example, PyInstaller) but many journeyman Python developers find it confusing compared to the build processes of compiled languages.

Another common criticism casts Python as being slow. When put next to C, C++, or Java, Python is indeed far slower – often by an order of magnitude. Python developers mitigate this with packages that use C “under the hood”. This permits the developer to combine the fast development cycles of Python with the high performance of C. Learning C might appear to be the superior answer, but while writing basic C is relatively easy, writing fast C is very difficult for the inexperienced. C also lacks many of the properties that make Python code so fast to write and easy to read.

For large data workloads, the argument on performance is largely irrelevant. Most data validation for conversion is limited not by how fast code runs but rather by how long reading from and writing to disk takes. The speed difference between C and Python is purely academic in this context – both take the same amount of time to access data from disk. Just swapping a slow hard drive for a fast solid state drive cuts data validation times by up to 90%. And since Python can leverage C for the parts of code that are compute-intensive, Python is more than a match for the task at hand.

Keeping Python Development Simple

Adding a piece of standalone development can contribute a layer of complexity to any project. Modern software development methodology mitigates this complexity. Using a code repository with versioning like Github or Bitbucket, relying on test-driven development, and developing with an emphasis on keeping code readable and reusable are all standard techniques that cost nothing but a little forethought. Python development, in my experience, is more reliable than ABAP. Python’s idioms focus on keeping code compact, legible, and maintainable. ABAP development, by comparison, can result in code that is sprawling and difficult to read. In fact, the Python code I wrote to perform data validation took less than 500 lines. Much of this was code that automatically generated data quality reports in an Excel document. The end result was simple and easy to maintain – and arguably less complex than using VLOOKUPs across a plethora of Excel spreadsheets.

And There’s More…

Python code can go beyond providing an engine for data validation. Thanks to the flexibility of Python and the power of its third-party libraries, it’s actually possible to make data conversion entirely automated with Python. Next time, I’ll show you how. In the meantime, I’ll be wearing my knockoff Members Only jacket while cruising around in my DeLorean. See you then!

Notes:

[1] “The 2016 Top Programming Languages” by Stephen Cass

[2] “5 Simple Coding Languages To Learn For First-Time Learners” by Kavina Ivers

[3] “Python is Now the Most Popular Introductory Teaching Language at Top U.S. Universities” by Philip Guo[/fusion_builder_column][/fusion_builder_row][/fusion_builder_container]

Talk to a Rizing Expert

Whether you’re just getting started or are well on your way, we can help.