Modernization Code Converter

  _____        __      _____                      __         
 / ___/__  ___/ /__   / ___/__  ___ _  _____ ____/ /____ ____
/ /__/ _ \/ _  / -_) / /__/ _ \/ _ \ |/ / -_) __/ __/ -_) __/
\___/\___/\_,_/\__/  \___/\___/_//_/___/\__/_/  \__/\__/_/

Overview

Let’s face it, modernization efforts are a necessary evil. They are large, and complicated, and add no business value. Whatsoever. Until your application doesn’t work anymore. Then it is too late to start modernizing.

In the case of Indiana DWD, they have 2 million lines of code still running in Java 8. Last I checked we are on 24. This isn’t a large scale application by any means–I’ve seen much, much larger. But it’s not trivial, either. If you compare lines of code to words, it’s the equivalent to the Wheel of Time series. And with the half-life of software sitting at around two and a half years, that’s a lot of modernizing. There is a definite solution to this problem, but that’s the subject for another post.

So I did what any architect would do. I proposed a meta-programming solution. It’s been a long time since I wrote code. I now write code that writes code. Or have AI write code that writes code, that writes… you get the idea.

It’s a lot of code, but ultimately every Java EE project that needs to be modernized is pretty predictable. An aenemic domain model, with a DAO layer, services, controller and front end. SOAP, REST, hell, even Microservices all follow this same pattern. Write a lot of garbage code, and leave, because writing code is fun, and reading it is painful. But that’s the next developer’s problem. The problem is, usually you are the next developer. The good news is, as I said, it is totally predicatable.

This predictability is what makes automation so effective. Parse the code into a syntax tree, transform it, and print it back out. As modernized code. You don’t even have to play around with Lexers and Parsers. Java has been around for so long there are numerous libraries that do it for you. JavaParser is a good library, so I started with that.

Getting Started

The first step may not be so obvious. Write a printer that spits the code back out, unchanged. For a language like Java, this is around 200 – 300 lines of code. And make sure everything compiles. If you skip this step, you will regret it, I promise.

The next step is to generate unit tests. I’m not sure why, but legacy code always seems to have little or no unit tests. At this point you are not unit testing for correctness, but rather to ensure that you have a base line for refactoring. I like to think of this process as turning a shirt inside-out. Close to 100% coverage can be generated fairly easily for almost any Java class.

Yes, I generated coverage for getters and setters. I know there is a big argument about whether unit tests are needed for getters and setters. They are easy to do. And, yes, I found several hundred defects relating to getters and setters in the code base. Generating a unit test for other methods can get tricky, but not quite as tricky as the average LeetCode question. If you want to take a short cut, it’s fairly easy to set up a CoPilot API that you send the method and have it return the unit tests.

Converting

So, what types of conversions can you do? Everything ranging from simple refactors, like introducing the “var” statement or adding the diamond operator, all the way to large structural changes. Using the Visitor pattern as a basis for the printers, the simplest changes are only a few lines of code, and the largest are seldomly larger than a few dozen.

I wanted to introduce Kotlin to the domain model, because of its compatibility with the JVM, and its clarity and readability. Indiana wasn’t ready for Kotlin, so we settled on Lombok. They were already using it, and I like it. It’s a great compromise. A few lines of code to conditionally add annotations and remove boilerplate methods, and the code is already much more readable. With a comprehensive unit test suite in place, we can proceed with confidence.

Of course, you WILL run into problems. The code is predictable, but not THAT predictable. Developers do weird stuff like having multiple getters for the same field, differing only by case. I added audit loggers to output failed methods, and loop it back into the converter–next time, leave that code alone during conversion and add a TODO to fix the problematic code.

Hibernate to JPA

Converting from Hibernate wasn’t much more challenging technically than the Lombok converter. You have to read in all of the hbm files, converting the XML to annotations. Then in the JPA printer, you look up the annotations by class and field name. There are small things to consider. Your lookup needs to make sure each entity is mapped only once. JPA doesn’t like it if you have two entities with the same name. I said it is technically not harder than the Lombok, but you really need to know JPA in depth to make sure you get the annotations completely correct. The hardest annotation, without a doubt was @ManyToOne with the inverse mapping in the other class.

Done?

So the code compiles and all 4k unit tests pass. We are done, right? Right? Not even close. To ensure the conversion is working properly, you need integration tests. Can you create the entity? Read? Update and delete? Do the named queries work properly?

The starting point for the Integration Test generator was the Unit Test printer. First we need to set up the infrastructure. I used an in-memory H2 database, so it technically wasn’t an “integration test” but was more like a component test, but changing to a real database was simply tweaking a few properties.

The tests should be able to run in any order so this requires setup and teardown for each test. There should be one test for create and read. One test for update for each field, and one test for deleting. For example an update would create the db, add the entity, assert the value of the field, update the entity, read the entity, and assert the value changed.

For the unit tests, I could get away one value for each type. String is “TEST”, Long is 0L, etc. This doesn’t work for integration tests. If the field is mapped with a @Column(name = “test”, length = 1) the value “TEST” will fail. “T” would pass, but I wanted to make the tests a little nicer than that. There’s test data in the database. Use it. I created an adapter to read the most used values from each table in the database, and created sample data for each entity.

Miscellaneous

Not strictly necessary, but a code-review suggested I create test fixture constants eliminating duplicate values between the unit and integration tests. That was fun.

I also enjoyed creating the REPL. It made the conversion process more convenient. I added commands enable and disable generators, modify test values, show values, and run each step of the process. Plus it makes for a good demo. That’s always helpful.

Big Ball of Mud

All of the entities were in a Common JAR in a single package.

more to come…