Evan d'Entremont

intentionally provocative musings on tech

🇫🇷 Switch to French

A Trip into Software Hell.

It’s not about how you start. It’s about how you finish.

– Kobe Bryant

In the software world, some codebases are so tangled and outdated that they feel like a trip into software hell. This post recounts my experience with one such codebase—a nightmare that was as much a lesson in poor practices as it was a survival challenge. It’s a story about legacy code, the harsh realities of software development, and the irony of success despite everything.

Imagine a root directory overflowing with over 1,000 PHP files, each a relic from a bygone era. Some were from the days of PHP3, and the code was a chaotic mess of outdated practices. Magic quotes, long deprecated, were still in use. include and require statements were scattered haphazardly, and functions were inconsistently applied. Display output was found all over the place, with some files even functioning as functions themselves.

Despite this chaos, the company emerged as a market leader, landing national contracts for major organizations. This highlights the value of being first to market and the edge that incumbents can have, regardless of the underlying systems.

I am not here to be liked. I am here to be respected. – Pat Riley

The work environment was a nightmare in its own right. Developers were underpaid, required to wear dress shirts, and endured a volatile atmosphere. The CEO once threw a chair through a wall out of frustration due to a bug. Rumor had it the VP held a record for penalty minutes in a Europe. It was a perfect storm of poor management and bad working conditions.

Everyone except the developers had “manager” in their title. Developers were relegated to a basement office with no windows. This made for a truly dismal working environment, highlighting the disregard for the development team’s needs.

You have to learn to play together – Michael Jordan

When tasked with creating a mobile app and an API on top of this chaotic codebase, a normal approach wasn’t feasible. The existing system was so tangled and complex that direct integration with its functions was not practical. Refactoring the codebase was off the table, so I had to work with what I was given.

All code examples are purely from memory to illustrate the concept.

Instead of trying to directly interface with the convoluted code, I had to rely on the site's own output and essentially export the data internally. Here’s how I managed the situation:

Spelunking with var_dump and HTML comments:

To determine what needed to be set in the PHP globals, I inserted var_dump calls and HTML comments into the pages. This allowed me to inspect the data structure and understand what was being set.

// At the top of the page where data was set
function dump_all_vars() {
    ob_start();
    var_dump($GLOBALS);
    $output = ob_get_clean();
    echo "<!-- START DEBUG DATA -->";
    echo "<pre>{$output}</pre>";
    echo "<!-- END DEBUG DATA -->";
}

// Call this function to include debug data in the output
dump_all_vars();

This method helped me figure out which globals needed to be set and how to manage the data without directly interacting with the complex codebase.

Setting up globals:

Since using any of the existing functions was impractical due to their complexity, and refactoring wasn’t an option, I relied on the output generated by the site itself. By including the pages that process form submissions, or lists information, I could indirectly access the data by abusing global variables.


// Start output buffering
ob_start();
// This sets up a $api_data global
include('path/to/form_submission_page.php'); 
// Clean (erase) the output buffer and turn off output buffering
ob_end_clean();

// Now we can use $api_data, ie
$json = json_encode($api_data);

This approach involved letting the site do the heavy lifting. It was a workaround to deal with the limitations of the existing system and provided a way to interact with the data without needing to refactor the existing, overly complex codebase.

It's not about being the best. It's about being better than you were yesterday. – Anonymous

When it came time to upgrade the PHP version to 5.6, the situation took another bizarre turn. Magic quotes, a feature deprecated in PHP 5.3, was deeply ingrained in the codebase. The system relied on magic quotes for input sanitization, which meant that simply turning off this feature would break the application. To address this, I had to write a fake_magic_quotes function to replicate the old behavior. This function was automatically included at the top of every page, ensuring that even as we updated the PHP version, the codebase continued to operate as if magic quotes were still active.

Here’s a simplified example of how the fake_magic_quotes function looked:

function fake_magic_quotes($value) {

    if (is_array($value)) {

        return array_map('fake_magic_quotes', $value);

    } else {

        return addslashes($value);

    }

}

// Automatically apply fake_magic_quotes to incoming data

$_GET = fake_magic_quotes($_GET);

$_POST = fake_magic_quotes($_POST);

$_COOKIE = fake_magic_quotes($_COOKIE);

This function recursively adds slashes to incoming data, emulating the magic quotes effect. The absurdity of maintaining such outdated practices in a modernized environment was a stark reminder of the tangled mess this codebase had become.

"You miss 100% of the shots you don’t take." – Wayne Gretzky

One of the most absurd aspects was our $40,000 monthly bill for a Postgres server, just to ensure a single query ran "quickly". The codebase handled UTF-8 character substitutions on both input data and server-side data, applying every possible substitution. This was done twice—once before sending data to the database and again on the data retrieved from the database.

The API had a "search by name" feature to automatically download schedules, something I was strongly opposed to on privacy grounds. I argued the point, was told to implement it anyways, and eventually had no choice. It took approximately one day before the feature was pulled for obvious reasons.

Here’s a simplified example of the query, illustrating the severe issues caused by relying solely on the database:

The original query employed nested REPLACE() functions for character substitutions directly in SQL, leading to a highly complex and inefficient query. Below is the complete version of the original query:

-- Original query with extensive REPLACE() functions

SELECT *
FROM players
WHERE REPLACE(
    REPLACE(
        REPLACE(
            REPLACE(
                REPLACE(
                    REPLACE(first_name, 'Ă©', 'e'), 
                    'Ă«', 'e'),
                'Ă ', 'a'),
            'è', 'e'),
        -- More replacements for every possible UTF-8 character
    ),
    -- More replacements for every possible UTF-8 character
) = REPLACE(
    REPLACE(
        REPLACE(
            REPLACE(
                REPLACE(
                    REPLACE('Some input data', 'Ă©', 'e'), 
                    'Ă«', 'e'),
                'Ă ', 'a'),
            'è', 'e'),
        -- More replacements for every possible UTF-8 character
    ),
    -- More replacements for every possible UTF-8 character
)
AND REPLACE(
    REPLACE(
        REPLACE(
            REPLACE(
                REPLACE(
                    REPLACE(last_name, 'Ă©', 'e'), 
                    'Ă«', 'e'),
                'Ă ', 'a'),
            'è', 'e'),
        -- More replacements for every possible UTF-8 character
    ),
    -- More replacements for every possible UTF-8 character
) = REPLACE(
    REPLACE(
        REPLACE(
            REPLACE(
                REPLACE(
                    REPLACE('Some input data', 'Ă©', 'e'), 
                    'Ă«', 'e'),
                'Ă ', 'a'),
            'è', 'e'),
        -- More replacements for every possible UTF-8 character
    ),
    -- More replacements for every possible UTF-8 character
);

This query performed character replacements on every possible UTF-8 character, and it was done twice—once for the search and once for the data. This created massive inefficiencies, turning every search into a slow and painful process. It was a classic case of failure due to a lack of proper training and oversight.

To improve performance, the query was refactored by normalizing the data before querying and simplifying the SQL query. In my defense, array_replace didn't exist yet. Here’s the refactored version:

-- Trigger to normalize first_name
CREATE TRIGGER normalize_first_name
BEFORE INSERT OR UPDATE ON players
FOR EACH ROW
BEGIN
    SET NEW.first_name_normalized = REPLACE(REPLACE(REPLACE(NEW.first_name, 'Ă©', 'e'), 'Ă«', 'e'), 'Ă ', 'a');
END;

-- Trigger to normalize last_name
CREATE TRIGGER normalize_last_name
BEFORE INSERT OR UPDATE ON players
FOR EACH ROW
BEGIN
    SET NEW.last_name_normalized = REPLACE(REPLACE(REPLACE(NEW.last_name, 'Ă©', 'e'), 'Ă«', 'e'), 'Ă ', 'a');
END;
function normalizeString($value) {
    return iconv('UTF-8', 'ASCII//TRANSLIT//IGNORE', $value);
}

// Example usage
$sql = "SELECT * FROM players WHERE first_name_normalized = :first_name AND last_name_normalized = :last_name";
$stmt = $pdo->prepare($sql);
$stmt->execute([':first_name' => normalizeString('José'), ':last_name' => normalizeString('Döe')]);

The original query utilized extensive nested REPLACE() functions for character substitution within SQL, which resulted in inefficient and slow query execution. This approach was complex and led to significantly longer processing times.

In contrast, the refactored approach utilized triggers to keep the first_name_normalized and last_name_normalized columns updated with normalized data, which were indexed for efficiency, while PHP was employed for preprocessing the input data.

The harder the battle, the sweeter the victory. – Les Brown

The plaintext storage of passwords was another glaring issue. Passwords were stored in plain text in the database, making them visible and easily accessible. This practice was initially adopted to allow the support manager to look up passwords if needed. The support manager was resistant to change, arguing that without access to plaintext passwords, he wouldn't be able to log into customer accounts when required. This insecure practice underscored a significant lack of understanding and concern for data security.

To address this critical issue, I developed a proper authentication system. At least it was easy to import the passwords.

This new system allowed the support team to log in as users without exposing or needing access to their plaintext passwords. While this transition was necessary, it was a frustrating process. It highlighted the challenges of improving data security and protecting user information while overcoming institutional resistance to change. By implementing this new authentication system, we moved towards better practices in data security, ensuring that sensitive information was handled with the necessary care and respect.

I’ve failed over and over and over again in my life and that is why I succeed. – Michael Jordan

Reflecting on this experience, the irony is striking. Despite the code quality, the company succeeded based on client lists and market perception rather than the actual quality of its software. It’s a stark reminder of how sometimes, a company can thrive despite having a system that’s deeply flawed.

In the end, the experience was a harsh lesson in poor coding practices, bad management, and the ironies of success. It showed me that the game isn’t won with clean code.

Sometimes, it’s like scoring points with a broken play and hoping the ref doesn’t notice.

last updated 2024-09-22

Posts