What is a scalable infrastructure?

A scalable infrastructure is a combination of software components that are designed to handle data increases without hurting performance, reliability, or UX. It’s usually one that’s flexible enough to support business needs without costing you millions.

Why do I need a scalable infrastructure?

You need a scalable infrastructure to offer a good quality of service to your customers as you grow without hurting your: Business finances Team’s work-life balance User experience Performance and loading times Credibility and reliability

Web Technologies

8 Best Practices for Building a Scalable Infrastructure

Frequent downtime, server overload, or an overworked team can be hints of poor software infrastructure. Here are recommendations for building a more scalable one.

Sharing is caring!

by Matias Emiliano Alvarez Duran

05/21/2024

Picture your software infrastructure like a game of Jenga, a mix of blocks that make up a high tower. Like in the game, to scale your infrastructure, the base needs to be strong and steady. Otherwise, the tower collapses.

To build a scalable infrastructure, you need to make decisions at the design stage that support your growth.

In this article, we’ll show you how to spot issues with your infrastructure in the short and long term. We’ll also include best practices for building a scalable infrastructure with insights from senior engineers.

Your infrastructure is failing but your team is overworked? Hire NaNLABS as your technical sidekick to spot the root cause and revamp it for you.

How to spot scalability issues with your infrastructure

It’s quite easy to know when something is wrong with your software. However, identifying what’s causing those problems isn’t only tricky but time-consuming.

Many people patch problems with quick solutions, workarounds, or by offering discounts. But this is risky. It can turn your technical debt into real financial loss. So, pay attention to the list below to spot potential scalability issues.

Infographic showing short and long-term signs of an unscalable infrastructure

Short-term signs of a weak infrastructure:

Slow response times due to databases failing to process the data and traffic
Frequent downtime and outages due to poor hardware or server selection
High customer churn and increased complaints thanks to poor performance and service
Low user retention due to frustration with the overall experience

Long-term issues due to not having a scalable database infrastructure:

An overworked software team that doesn’t have time to find and solve the root cause of the problem
High costs of storage due to picking the wrong database or improper data handling
Feature hacking to emulate a competitor’s functionality you lack by spending a lot of resources
Servers running hot since they’re working overtime to process the high volume of requests and user traffic
Inability to build integrations with new tools thanks to legacy components that are outdated and incompatible with new technology
Decreased LTV (Customer Lifetime Value) and high CAC (Customer Acquisition Cost) as users churn after they join due to poor quality and it becomes harder to get users to trust you
Reduced market competitiveness and risk of future investments for the reasons mentioned above

A poor infrastructure, in extreme cases, could cause the end of your business.

8 Best practices for building a scalable infrastructure

You always need to think about the infrastructure, whether designing a new platform or migrating from on-premise to the cloud. E.g., Which technology to use, what’s the right database, and how to scale in the future.

Here are 8 best practices we follow to build a resilient and scalable infrastructure. We include inputs from the engineering team at Strava, the running, hiking, and cycling app, and Gustavo Alberola, Software Developer Advocate at NaNLABS.

1. Review the state of your current infrastructure

Run an audit to explore potential issues with your database, servers, and legacy code. Come up with insights and set up a plan to modernize and strengthen your infrastructure.

Let’s take a look at Strava’s example. “Back in 2009, we settled on Ruby on Rails and MySQL, which was a pretty reasonable trade-off at the time. But then we grew, and that infrastructure began to struggle,” says Jacob Stultz, Senior Staff Engineer at Strava.

Its infrastructure had issues with scaling the data storage, retrieval systems, and application logic. This initial analysis of the platform’s infrastructure allowed the Strava team to research other alternatives and gradually solve the problems.

2. Design to scale

You can't always predict what will happen. New technology could outdate previous solutions and your business needs may vary. Also, the trade-offs that made sense at the beginning might not be valid years later. However, designing for scalability will make it easier for you to address issues as your business grows.

Let’s use databases as an example. These pose a typical trade-off: simplicity now vs. complexity later on.

Having a single SQL-like database used for everything is less work than maintaining several different ones, but it might become a bottleneck in the future.

Gustavo AlberolaSoftware Developer Advocate at NaNLABS

This happened to Strava. The team realized that the database they’d built the app on was not fulfilling their needs. The design was set to fail from the beginning.

While it’s common practice to push code fast in the early stages, it’s best practice to design apps that scale.

2.1. Think of horizontal scalability

Services that can scale horizontally are a better alternative for scalability in terms of throughput. Meaning, that you can add more boxes to the side when needed, rather than having to buy more powerful hardware—which happens with vertical scaling.
This, in turn, has a trade-off. Horizontal scalability requires a specific design. This design needs to account for things like sharding.

3. Use infrastructure as code (IaC)

Design a scalable product from the beginning by treating your infrastructure as lines of code. This modern approach makes it easy to scale up or down as needed—you just need to tweak the text.

“This allows us to automate the creation and maintenance of the infrastructure. It also reduces the chance of human error when adding new environments, performing changes, and fixing bugs,” says Gustavo.

For example, by treating your IaC, you can set up CI/CD pipelines to automate testing and fix any issues as they appear.

“If you see scalability in the number of services your company will create and maintain, planning to have several teams producing new services manually might be unscalable. This is a great case for IaC, to automate new services coming in,” says Gustavo.

Plus, if you’re going for a microservices approach for more flexibility, IaC will simplify adoption.

“It also allows a version control over infrastructure, allowing it to revert to a previous stage if something goes wrong,” says Gustavo.

3.1. Replicate your system

It’s good practice to replicate the information and multiply the throughput instead of putting all the stress into a single node.

“This is especially useful in processes that might not require real-time information, but need to process high volumes of data, e.g., processing analytics,” says Gustavo.

However, you should analyze this decision as replicating your system could hurt consistency.

4. Opt for managed products

Choosing managed products vs. self-managed ones helps reduce operational overhead. You have to pay a fee to increase availability, rely on secure platforms, and get to market faster. But, you save money in operational costs—and spare your engineers time maintaining and developing in-house solutions.

Also, managed products make it easier to scale because you can increase or reduce your requirements as needed.

Strava seems to agree as it chose Aurora AWS and Apache Cassandra to scale horizontally and optimize performance. This is also an example of a polyglot database architecture, which helps build more scalable products.

5. Leverage cloud services

Cloud services are designed for scalability. This is because they don’t need you to go and buy hardware to host your software on-premises. Instead, you can use existing cloud services as blocks and compose them to serve you and your business.

Just like with managed products, you don’t need to go through the hassle of maintaining the system yourself.

“When using a cloud service, you also gain extra resources in maintenance (you don’t have to fix the service itself), gain from the experience of others, and benefit from new features being added,” says Gustavo.

Also, the licenses for cloud services are usually less expensive than self-hosted ones.

6. Go polyglot with your database selection

Databases are the heart of your service. Choosing the wrong ones can severely impact performance, data security, and business finances.

This opens up the all-time data engineering challenge: SQL or NoSQL?

Relational databases are reliable, support consistent concurrency, and are widely supported. However, scaling SQL databases is expensive because you can only scale vertically, which means paying for a bigger machine to run the database.

NoSQL databases are less standardized but are generally built for horizontal scalability. So, the decision for building a scalable infrastructure isn’t to choose between one or the other but to use a mix of the two.

Example of a polyglot database architecture

A diagram showing examples of database providers for different use cases - Martin Fowler

We believe it’s best practice to use a polyglot database architecture when building scalable applications. This is because you can assign a provider to each one of your services following the right logic.

7. Avoid a single point of failure

Your architecture needs to support your infrastructure. A modular architecture, for example, lets you protect yourself in case something fails. By having everything in modules, if a part clashes, your infrastructure prevails.

Avoiding a single point of failure means that your data is protected by having it separated.

“If it’s information, it lives in several places (replication) so if one box goes down, the others can still function. If it’s processing power, that request can be handled by any box in particular, allowing it to distribute traffic as the service sees fit,” explains Gustavo.

8. Iterate, iterate, and iterate

Lastly, don’t be afraid to try out your ideas. Let’s bring Strava’s example back for a second. Its development team started out using Rails and MySQL in 2009. “As our platform scaled, the initial setup became increasingly unsustainable, prompting us to transition to Redis as the data store for leaderboards [Strava’s app feature] in 2012,” says Jeff Pollard, Senior Software Engineer at Strava.

But in 2016, Strava noticed that the leaderboard infrastructure was facing many operational challenges that ranged from data inconsistencies to outages, and scalability issues. “After diagnosing these problems, we realized that Redis's memory-intensive nature and single-threaded processing posed significant limitations,” added Jeff.

All of the issues cost Strava money and time trying to keep it under control. So, they analyzed the problems and chose to redesign the architecture. “In late 2017, we switched from Redis to Cassandra for our canonical leaderboard storage, and then we changed the leaderboard service from an RPC system to a stream processing system,” adds Jeff.

But it turns out Cassandra had some limitations too, so Strava reintroduced Redis as a cache using Elasticache AWS.

This example shows that your infrastructure needs will change over time and you’ll need to iterate many times in the future. Follow the scalable infrastructure best practices on this list, but also, revisit your components every once in a while to iterate and improve your infrastructure.

Diagram of Strava’s new system architecture using Rails and Cassandra

Rough diagram of Strava’s architecture shared at one of the AWS Loft events. Previously, all updates to the datastore were done by Rails.

Quick recap: What are ways to prepare my infrastructure for scaling?

Recognizing early signs of weakness in your infrastructure and addressing them proactively lets you avoid more severe issues down the road.

Some best practices for building a scalable infrastructure include reviewing the current state of your infrastructure and leveraging infrastructure as code. By following these best practices and learning from real-world examples like Strava's infrastructure improvement, you can build a robust and scalable foundation for your software.

However, we know that having a weak infrastructure can cause your team to be overworked. They’re always putting out fires so they have no time to find and solve the main problems. We get it, this happens to many of our clients—that’s how we ended up working as an augmented team for Equinix.

At NaNLABS, we’re devoted to helping your team solve the most complex or time-consuming issues with your software. This way, they can focus on keeping the ship afloat. From team augmentation to staffing and data engineering, we handle your problems as if they were our own. At NaNLABS, you’re not just a number.

Sounds good? Let’s get in touch.

Frequently asked questions about scalable infrastructures

What is a scalable infrastructure?
A scalable infrastructure is a combination of software components that are designed to handle data increases without hurting performance, reliability, or UX. It’s usually one that’s flexible enough to support business needs without costing you millions.

Why do I need a scalable infrastructure?
You need a scalable infrastructure to offer a good quality of service to your customers as you grow without hurting your:
- Business finances
- Team’s work-life balance
- User experience
- Performance and loading times
- Credibility and reliability

Analyzing Real-Life Examples of Polyglot Database Architectures (And The Hidden Costs of a Monolithic Approach)

Read the complete article

Next blog post

Is Serverless Expensive? Understanding AWS Serverless Pricing & When It's The Way To Go