When building a company, you often have to make sacrifices. There is never enough time, money, or talent to get everything done. Sure, you’d like to build your product the right way, but you’ve got to get something built quickly so you can raise money, or get new features out the door so you can land that big client. There’s no time to plan! We have to move fast! I get it.
Then one day, you’ve finally got your feet under you. You’ve gained some traction, you’ve got revenue coming in, and you’re growing quickly. You’ve been bootstrapping and hustling and crushing it. You’ve arrived.
Now you’re starting to notice something is off. Your engineers used to be able to crank out new features quickly, but they’re now missing deadlines, and they’re having a hard time telling you when things are going to get done. Things are breaking more often in production, and you feel like you’re constantly in fire-fighting mode. The system seems to be slowing down, and your users are complaining about performance and bugs. Your employees seem burned out, maybe you’ve even lost a few people, and your new hires are having a hard time getting up to speed.
You are now bogged down with technical debt. All those corners you cut to get where you are now are killing your productivity, your morale, and your business. You avoided paying the true cost of software development, and now the debt is due, with interest.
Almost every company I consult with suffers from this problem to some degree. A certain amount of technical debt is excusable, but you need to start paying it down, or your productivity will grind to a halt. Sure, it will add overhead, but as your team gains momentum by clearing the roadblocks, you’ll start seeing improvements in every area of your business. This is unavoidable, so you might as well get started now. Let’s discuss some of the critical areas where you’re likely not paying the true cost of software development, and some important steps you can take to get on the right track.
Development Process
“We do Scrum, sorta” is a phrase I hear with almost every client I meet. This typically means they use Jira to track tasks, and they time-box their work into sprints. Maybe at the end of the sprint they release to production. That’s about it.
Yet they are missing the single most critical aspect of agile development: the retrospective. The foundation of agile development is based on reflection and continuous improvement. Throw away every other aspect of Scrum, Kanban, or Mob. If you aren’t doing a retrospective after every sprint, you are never going to improve. Yes, it takes time, but if you aren’t investing in your growth, you will never grow.
An effective retrospective is a process of reflection, communication, brainstorming, and experimentation. We discuss what went well and what didn’t during the last sprint. We communicate our frustrations to each other in a constructive and honest way. We brainstorm ideas for tweaking the process or communication to improve, and we commit to experimenting with a new process during the next sprint. The ideas that fail are discarded, and the ideas that succeed are baked into the process going forward.
Requirements Definition
There is often a language barrier between product management and engineering. Product managers typically think in terms of user-benefits, while engineers think in terms of technical implementation. PMs think high level, while engineers are down in the weeds. This communication breakdown results in a lot of wasted time and effort to clearly define the goal and how it should be implemented in fine detail.
Your PMs should capture the vision of the feature in an effective user story. They must then account for all the “what if” scenarios of various use cases. I typically follow the user story with a series of if-then statements that account for as many edge cases as possible. You should also include acceptance criteria, ideally in the form of a test case. While you certainly want to leave room for the developers to be creative, there should be no ambiguity about what success looks like.
Code Reviews
After the retrospective, the code review is the second most important ritual in software development. This is not just about walking through code line by line, it’s about explaining the user story, validating that it has real success criteria, demonstrating that it works correctly, and ensuring the code is up to standard.
My process for a code review looks like this:
- Describe the problem to be solved — Review the feature request and explain the purpose and objectives.
- Review success criteria — Review the test cases associated with the feature. Make sure there is a clear definition of “done”.
- Review the code — Explain the architecture and design. Look for adherence to coding standards and best-practices. Ensure the code is clean and well documented. Look for opportunities to improve architecture, modularity, security, and performance.
- Demonstrate the code — Show that the feature works as intended in a test environment. Walk through test cases and prove that the feature is functionally complete.
- Review test and deployment automation — Ensure that some level of test and deployment automation were developed to support the feature. Show that the automated tests are integrated into the CI/CD pipeline.
- Capture notes in the pull request and task — Paste the notes from your code review in the pull request and the original task as an audit trail that the code review was completed successfully.
Code Management
When I engage with new clients, one of the first areas where I look for disorganization is in the code repository. I’ve seen hundreds feature branches that haven’t been removed since inception. I’ve seen commits with no commit message. I’ve seen developers commit directly to master without a code review. I’ve seen developers working against revisions that are out of sync with production. The horror.. The horror!
For starters, use the Git Workflow. Every developer should create a new feature branch for each new story. When a feature is complete, it should be merged into a development branch. When a release is pushed to production, a release branch should be created as an artifact of that release. The feature branch should then be deleted after release.
Use pull requests to merge branches. Don’t commit straight into a main branch. No pull request should be approved without performing a code review first.
Create meaningful commit messages, and use descriptive names for branches. Your code history should be readable, just like your code.
Architecture
Imagine if you were trying to build a house with no blueprint and no foundation. You decided to just start slapping boards together, because you needed to get the house built quickly. This might work if you were building a tree house, but not a skyscraper. The same goes for software.
- Use frameworks, and use them consistently. Keep them up to date. Don’t reinvent the wheel, use frameworks like Laravel, Express, Rails, Django, etc.
- Use object-oriented design. Any interaction between components should be done either via an API or an object interface. Reusable components should be put into a class.
- Abstract and protect the database. The database interaction layer is where the most security risk and effort to refactor come from. Create a solid data layer that is holy ground with respect to coding and security standards. Don’t allow any database interaction outside of this layer.
- Use APIs to communicate between major components. Adopt a micro-services model if possible. APIs allow systems to easily scale with additional hosts and a load balancer.
- Use design patterns wherever possible, e.g. MVC/MVVM, Factory, Singleton, Observer, etc.
- Use the SOLID Principles of object-oriented design.
- Limit nested loops and recursion, and generally pay attention to code complexity.
Data Model Design
The data model is the foundation of software. If there is one area to focus on early on, it is creating a clear, logical, scalable data model. Since every piece of code writes to or reads from the same set of data tables, it is incredibly difficult to alter your data model later on. I find that retrofitting a new data model to an existing codebase is the most expensive type of maintenance.
- Use appropriate data types. For example, do not store numeric or datetime data in varchar fields.
- Use views to abstract the data model from the code. This allows you to easily change the underlying data structure later on, without having to change the code.
- Create indexes on tables based on likely query usage. This greatly speeds up query performance. Indexes work best on fields that have repeated values. For example, an index on a field containing colors will perform much better than an index on a date-time field, where most values will be unique.
- Use reference tables for structured reference data. Most data models contain fields that have a pre-defined set of possible values. For example, cars may have a limited number of colors. A reference table contains all the possible values for a dimension, with an ID and any relevant metadata describing each option.
- Use link tables whenever there is a many-to-many relationship between two tables. Do not use redundant fields like field1, field2, field3. The link tables should contain foreign keys to both tables, and any metadata that describes the relationship between the two records. For example, the link table that links people to cars might contain the fields years_owned and is_primary_driver.
- When publishing reports, create reporting tables to greatly boost performance. You should avoid performing complex calculations at run time. Pre-cache reports by creating a table containing all the fields necessary to populate the report without any additional joins. Indexes should be created for the main filtering criteria
QA and Test Automation
QA does not mean kicking the tires on a new feature. It doesn’t mean going through the main workflow, using only obvious valid inputs. QA is about trying to break the system in a thorough, predetermined, methodical way.
First, you should create test cases with a proper template, such as:
Test Case ID: A numeric value, which will be referenced by the test plan document. It could be the ID of the task in Jira.
Title: A single descriptive sentence, usually prefixed by the component it tests, like Login or Purchase.
Description: A short paragraph giving context about the feature you’re testing and how it should behave.
Preconditions: Things that must be done to setup the test case, like “Must be logged in as administrator”
Test Steps: The detailed steps to execute the test case. They should be written as clearly and concisely as possible, and should be “dummy-proof”, meaning anyone can execute them without any prior knowledge of the system.
Expected Results: What you should expect to see if the test passes. Again, it should be clearly written with zero ambiguity. If the expected results aren’t met, the test fails.
As you create new test cases, you should capture them in a test plan document. This is typically a spreadsheet that lists all the test cases which should be run with each QA cycle. I use a master template which I duplicate for every release. As we run through each test case, we set a Pass/Fail value on the sheet, and link to a bug report in the case of a failure. As a part of our deployment process, we make sure QA has delivered a completed test plan prior to release.
In the test plan, I specify whether a test case is applicable for a smoke test, a full regression test, and/or post-deployment validation. I also say whether the test has been fully automated. I also include a link to a follow-up bug task in case the test fails.
Deployment Automation and Server Configuration
Manual deployments introduce an incredible amount of risk. If your developers, who are likely under pressure to release on time, forget one step in a long deployment checklist, it can wreak havoc. They’ve got to ensure the correct code branch is deployed, database schema changes are promoted, reference data are deployed, and any new infrastructure changes are deployed consistently across the environment. There is too much room for error.
All of your deployments and resource provisioning should be scripted and automated. Use tools like Chef, TeamCity, Jenkins, or one of the many other continuous deployment tools available. Use Docker, Heroku, Kubernetes, or Vagrant to automatically provision and scale your server infrastructure. For test automation, use Selenium, Mocha+Chai, phpUnit, or the analog for your language of choice.
Monitoring and Alerting
When something does go wrong (and believe me, it will), how will you know? Will you wait for your clients to call and complain? If you get hacked or DDoSed, would you like to know right away, or after your platform is crushed?
You should setup robust application monitoring and alerting for your infrastructure resources, URLs and API endpoints, and error logs. For error logs, I am a big fan of the Elasticsearch/Logstash/Kibana (ELK) stack. If you’re on AWS, you can set this up as a turn-key service. This stack allows you to post logs in JSON format with tags and values, which you can then search and filter in Kibana. You can create dashboards, reports, and alerts based on log data. No more will you have to ssh into servers and scan huge text files to find out what’s wrong with your system.
For endpoint testing, I use Runscope and NewRelic. You can create simple tests to hit your key URLs and APIs and parse responses. You can then create alerts that will tell you if your system is not responding, or returning errors.
Security
Security is the great bogeyman of software development. No one can ever be perfectly secure, and investing in security does hinder agility. It’s an inconvenience, but a necessary one. However, I’ve seen companies that completely disregard security, and can only be persuaded to invest in security with a carrot or a stick. Either they have to be HIPAA or PCI DSS compliant to avoid fines, or they can sell enhanced security as a differentiating factor.
I approach security as a set of business risks. Security risks largely fall into three categories: system availability, data loss, or data exposure. I create a risk register for my clients that include all the things that could go wrong given their level of security, and the impact it would have on the business in each of these three areas. The business then prioritizes the risk, and we work to plug the holes. I always refer to the OWASP Top 10 guidelines for secure coding standards, and the PCI DSS standards for infrastructure, database, and physical security.
Documentation
Ahh, documentation, the bane of every coder’s existence, right? We don’t want to create documentation that sits on a shelf, or becomes obsolete as soon as it’s written. We don’t have time to write docs, we need to push more features!
When your company grows, and you start hiring new people, how will you train them? How will they discover how the system works? How will they know what will break if they change some logic? If you don’t invest in documentation, you will have to use the time of your other developers to train every new person who joins the team. If you invest in documentation once, you can onboard many new people with the same investment.
I’m a champion of agility in all aspects of business, and documentation is no exception. I’m a big fan of wikis like Confluence for capturing bare-bones documentation that can be easily updated and referenced. Do anything you can to capture the information. Take screenshots, record videos, screen captures, whatever it takes to get it down so others can refer to it later. Don’t have time to develop sophisticated architecture diagrams? Sketch them on a whiteboard and take a picture with your phone.
Operational Support Tools
All platforms need maintenance. You may have scheduled tasks that fail, data cleanup tasks, and user support tasks that all need to read/write to the database. Any repetitive operational support tasks should be automated with a simple software tool. Most companies, however, avoid the cost of building the tools, but end up spending much more in the long run by pulling developers away from development to support the system.
Conclusion
Early in a company’s life cycle, you can get away with avoiding a lot of this overhead. You can scrap together an MVP and get to market quickly. You can scale to a certain point before the seams start to tear. Eventually though, you will have to get with the program and start paying the true cost of software development. The longer you wait, the more expensive it will be to retrofit.
How much of this do you think applies to you? Come on, you know it, I know it, we both know it. Don’t let this be you: