Learning
Oct 14, 2024
10 mins

Better and Cost Effective Alternative to AWS Sagemaker: Tensorfuse

Author
Agam Jain

Discover why Tensorfuse is a better alternative to AWS Sagemaker for AI inference tasks. This blog compares Tensorfuse and AWS Sagemaker, highlighting how Tensorfuse offers easier deployment, faster performance, and significant cost reductions—saving you up to 40% on AI inference workloads.

In this blog, we will explore Tensorfuse as an alternative to AWS SageMaker for running AI inference workloads. We will compare Tensorfuse and SageMaker on two key factors:

  1. Developer Experience: We’ll discuss various aspects of the developer experience, such as ease of getting started, learning curve, and technical limitations.
  2. Cost: We’ll compare EC2 vs. SageMaker costs and how you can save 40% by choosing Tensorfuse over SageMaker.

Overview

Amazon Sagemaker

SageMaker markets itself as an all-in-one integrated development environment (IDE) that helps ML engineers build, train, and deploy ML models at scale. However, it offers many overlapping services under one name, which can make it difficult to differentiate between them. The best way to determine which one is right for you is to categorize them into low-code and code-based ML services. Here is a quick overview of SageMaker’s services and their use cases:

  1. Low-code ML: This includes Canvas and Jumpstart. Both are no-code or low-code platforms that allow you to train and deploy models. Jumpstart focuses more on foundational models, enabling you to select models from the Hugging Face hub and other model registries. However, there is no straightforward way to bring your own custom model and deploy it via Jumpstart. Additionally, all models deployed through Jumpstart run on SageMaker instances, which are 40% more expensive than EC2 (more on this later)
  2. Code-based ML: This includes SageMaker Studio, which essentially offers EC2 instances equipped with the IDE of your choice (e.g., Jupyter notebooks, VSCode). You can bring your own custom code, make it compatible with SageMaker, and then train or deploy your models.

Tensorfuse

Tensorfuse is a serverless GPU runtime that operates on your own AWS account. It allows you to run any custom or open-source model on autoscaling infrastructure directly on EC2 instances. Under the hood, Tensorfuse configures a Kubernetes cluster, load balancer, and autoscaler within your AWS account. The best part? All of this takes less than an hour and can be set up using a few CLI commands—saving you from getting lost in cloud console UI.

Sagemaker vs Tensorfuse: Comparison

Now that we have a basic understanding of both platforms, we will explore Tensorfuse as an alternative for deploying models directly on EC2 instances, offering a 100x better developer experience while saving ~40% of the cost. For this blog, we will focus on running inference for any arbitrary custom or open-source model, comparing them on two axes: Developer experience and Cost.

Untitled

Let’s dive deeper into each aspect.

Developer Experience

  • Getting Started: These are the steps required before you can run any code.
    • Sagemaker: To use SageMaker, you’ll need to create a separate VPC and configure it with subnet and gateway settings specific to SageMaker. Then, you’ll need to create a SageMaker subdomain using this CloudFormation stack. After that, you must attach the required permissions to launch SageMaker Studio, Jumpstart, notebooks, etc.

    • Tensorfuse: To get started with Tensorfuse, you just need to run the following commands from your command line (AWS CLI must be configured):

      • pip install tensorkube
      • tensorkube configure - This will automatically create a CloudFormation stack and a Kubernetes cluster along with all the necessary resources required to deploy and auto-scale your model. Everything will be set up on your AWS account under a separate VPC.
  • Deploying and autoscaling
    • Sagemaker: You can’t directly deploy a pre-trained model in SageMaker. First, you need to convert the model checkpoints into a SageMaker-compatible model using the SageMaker SDK or the AWS SDK - Boto3. Then, you must create an endpoint configuration to specify the properties used to invoke the endpoint. To enable autoscaling, you’ll need to configure an autoscaling policy and attach it to the endpoint. All of this requires familiarity with the SageMaker SDK, resulting in a steep learning curve.

    • Tensorfuse: Tensorfuse is container-first**.** To deploy models using Tensorfuse, you just need a Dockerfile with your model (can be any model from HuggingFace) and the inference code. Then run the following command from your working directory:

      • tensorkube deploy —gpu-type a10g —gpus 1 - you can define the type and number of GPUs you want to use with simple flags. You can also define the min and max scale for your deployment using simple CLI flags.
  • Technical limitations of using Sagemaker
    • Slow startup: Every time you start the machine, it takes ~5 minutes. SageMaker Studio speeds this up, but not without other issues. At Tensorfuse, we have custom Docker implementations with container start times of ~20 seconds.
    • SageMaker serverless endpoints do not support images larger than 6GB, making it unsuitable for running foundational models.
    • You cannot mount an EFS drive.
    • SageMaker endpoints have separate quota limits compared to EC2 instances, and they are more restrictive.

Cost

Despite offering a worse developer experience, SageMaker instances are ~40% more expensive than their equivalent EC2 instances. Here’s a brief comparison of some of the most common GPU-accelerated instances on SageMaker and EC2:

Instance TypeSagemaker Price (/hr)EC2 Price (/hr)TF + EC2 Price (/hr)
p5.48xlarge (8 x H100)$113.07$98.32$99.12
p4d.24xlarge (8 x A100)$37.6885$32.7726$33.5726
g5.xlarge (1 x A10G)$1.41$1.006$1.106
g6e.xlarge (1x L40)Not available on Sagemaker$1.861$1.961

If you’re operating at scale and running 100,000 GPU hours per month on an A10G, your bill on SageMaker would be $141K/month. With Tensorfuse + EC2, it would be $110K/month, resulting in $372K yearly savings. All this while enjoying a superior developer experience and faster time to market for your generative AI application.

If you’re using SageMaker for GPU inference workloads and want to migrate to Tensorfuse + EC2 for a more cost-effective and streamlined solution, we offer white-glove migration services from SageMaker to your EC2 instances. Feel free to schedule a call with the founder: https://calendly.com/agam-jn/30min, and we’ll be in touch.

Tensorfuse Blog

Tensorfuse Blog

Dive into our blog to get expert insights and tutorials on deploying ML models on your own private cloud. Stay up to date with all things open-source and stay ahead in the GenAI race. Subscribe to get updates directly in your inbox.

Dive into our blog to get expert insights and tutorials on deploying ML models on your own private cloud. Stay up to date with all things open-source and stay ahead in the GenAI race. Subscribe to get updates directly in your inbox.

Get started with Tensorfuse today.

Deploy in minutes, scale in seconds.

import tensorkube


image = tensorkube.Image.from_registry(

"nvidia/cuda" ).add_python(version='3.9')

.apt_install([ 'git','git-lfs' ])

.pip_install([ 'transformers', 'torch', 'torchvision', 'tensorrt', ])

.env( { 'SOME-RANDOM-SECRET-KEY': 'xxx-xyz-1234-abc-5678', } )

.run_custom_function( download_and_quantize_model, )


@tensorkube.entrypoint(image, gpu = 'A10G')

def load_model_on_gpu():

import transformers

model = transformers.BertModel.from_pretrained('bert-base-uncased')

model.to('cuda')

tensorkube.pass_reference(model, 'model')


@tensorkube.function(image)

def infer(input: str):

model = tensorkube.get_reference('model')

# test the model on input

response = model(input)

return response



Get started with Tensorfuse today.

Deploy in minutes, scale in seconds.

import tensorkube


image = tensorkube.Image.from_registry(

"nvidia/cuda" ).add_python(version='3.9')

.apt_install([ 'git','git-lfs' ])

.pip_install([ 'transformers', 'torch', 'torchvision', 'tensorrt', ])

.env( { 'SOME-RANDOM-SECRET-KEY': 'xxx-xyz-1234-abc-5678', } )

.run_custom_function( download_and_quantize_model, )


@tensorkube.entrypoint(image, gpu = 'A10G')

def load_model_on_gpu():

import transformers

model = transformers.BertModel.from_pretrained('bert-base-uncased')

model.to('cuda')

tensorkube.pass_reference(model, 'model')


@tensorkube.function(image)

def infer(input: str):

model = tensorkube.get_reference('model')

# test the model on input

response = model(input)

return response



Get started with Tensorfuse today.

Deploy in minutes, scale in seconds.

import tensorkube


image = tensorkube.Image.from_registry(

"nvidia/cuda" ).add_python(version='3.9')

.apt_install([ 'git','git-lfs' ])

.pip_install([ 'transformers', 'torch', 'torchvision', 'tensorrt', ])

.env( { 'SOME-RANDOM-SECRET-KEY': 'xxx-xyz-1234-abc-5678', } )

.run_custom_function( download_and_quantize_model, )


@tensorkube.entrypoint(image, gpu = 'A10G')

def load_model_on_gpu():

import transformers

model = transformers.BertModel.from_pretrained('bert-base-uncased')

model.to('cuda')

tensorkube.pass_reference(model, 'model')


@tensorkube.function(image)

def infer(input: str):

model = tensorkube.get_reference('model')

# test the model on input

response = model(input)

return response



© 2024. All rights reserved.

Product

Blog

Pricing

Documentation

social

x.com

LinkedIn

Privacy Policy