Architecture V1 in an Early Stage Startup

What is Architecture V1

  • Infrastructure platform and programming languages are decided.
  • Major components of the system are clearly identified and put into place.
  • The engineering team can independently develop any component rather than worrying about stepping on each other’s toes.
  • The system is secure
  • The system is scalable for all major business use cases, commonly including a Read/Query and a Write path.
  • It is all streamlined to publish code, review, merge, build, test, deploy, rollback, promote staging to production.
  • Monitoring and alerts are set up.
  • Cost is efficient for running the above system.
  • The architecture has enough room to evolve, such as introducing new business cases, adding new MicroServices. In short, architecture V1 should be built to run for at least 1 year without urgency for a V2.

What did we have to start?

Choices, Decisions and Retrospection

Platform: AWS vs GCP

Platform: Serverless vs K8s

  • Java is not a good language for Serverless due to the slow start. Majority of our engineers are Java developers though. JS or GoLang will take time for the team to ramp up.
  • The team has tremendous knowledge operating MicroServices at scale. Picking Serverless gives up that strength.
  • Our data ingestion pipeline will have complex logics to process, aggregate and join data. We will need local and access to remote cache, which conflicts with Serverless’ nature for standalone logic.
  • Per not so thorough research, Serverless may not be as performant as regular Java app.
  • If we choose Serverless, we will make sure it is portable from GCP Function to AWS Lambda.
  • Testing will be different. So is deployment. It means we move into a completely new world.
  • Cloud independent. We can easily port our system from GKE to EKS or AKS, thanks to K8s.
  • Every service is dockerized. It means we can build each piece with the language that we choose: Java, GoLang, React.js… A lot of flexibility.
  • The concepts of service, pods, load balance, auto scaling are very familiar.
  • K8s is supported by Google, gaining strong momentum in industry, and battle tested.
  • The system is also more compacted than operating traditional EC2 + LB type of micro services.

Programming Language: consolidate to one?

Separate the major components

  • Read and write are separated.
  • Each component can scale out horizontally.
  • DB is configured for HA
  • API is the layer to hide actual implementation. For instance, we can replace datastore without changing APIs.
  • Most components are stateless by design.

The system is secure

CI/CD

Monitoring/Alerts

  • Cost: NR is CPU based. Since we do not have a large number of metrics but do have high CPUs in a K8s cluster, the cost is too high in comparison with DataDog. DD charges about $18 a host, no matter how much CPUs are there.
  • Retention: somehow, our NR data retention only allows 1 or 2 days, which is too short. DataDog allows a year.

Cost Efficiency

Architecture V1 has enough room to evolve

How long did it take to reach Architecture V1

Conclusion

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store