Run serverless GPUs on

Run serverless GPUs on

your

Deploy and auto-scale generative AI models on your own infra. Pay for what you use, no idle costs.

Deploy and auto-scale generative AI models on your own infra. Pay for what you use, no idle costs.

Backed by

Trusted by

  • The Forecasting Company

    T

    F

    C

  • Lumina

  • Haystack

  • The Forecasting Company

    T

    F

    C

  • Lumina

  • Haystack

  • The Forecasting Company

    T

    F

    C

  • Lumina

  • Haystack

Run serverless GPUs on

your

Deploy and auto-scale generative AI models on your own infra. Pay for what you use, no idle costs.

Backed by

Trusted by

  • The Forecasting Company

    T

    F

    C

  • Lumina

  • Haystack

Ship fast.

Ship fast.

Leave the heavy lifting to us.

Leave the heavy lifting to us.

Connect

Connect your cloud account (AWS, GCP or Azure) and Tensorfuse will automatically provision the resources to manage your infra.

Connect

Connect your cloud account (AWS, GCP or Azure) and Tensorfuse will automatically provision the resources to manage your infra.

Deploy

Deploy ML models to your own cloud via the Tensorfuse SDK.

Data never leaves your cloud and you can start using an OpenAI compatible API.

import tensorkube


image = tensorkube.Image.from_registry(

"nvidia/cuda" ).add_python(version='3.9')

.apt_install([ 'git','git-lfs' ])

.pip_install([ 'transformers', 'torch', 'torchvision', 'tensorrt', ])

.env( { 'SOME-RANDOM-SECRET-KEY': 'xxx-xyz-1234-abc-5678', } )

.run_custom_function( download_and_quantize_model, )


@tensorkube.entrypoint(image, gpu = 'A10G')

def load_model_on_gpu():

import transformers

model = transformers.BertModel.from_pretrained('bert-base-uncased')

model.to('cuda')

tensorkube.pass_reference(model, 'model')


@tensorkube.function(image)

def infer(input: str):

model = tensorkube.get_reference('model')

# test the model on input

response = model(input)

return response



Deploy

Deploy ML models to your own cloud via the Tensorfuse SDK.

Data never leaves your cloud and you can start using an OpenAI compatible API.

import tensorkube


image = tensorkube.Image.from_registry(

"nvidia/cuda" ).add_python(version='3.9')

.apt_install([ 'git','git-lfs' ])

.pip_install([ 'transformers', 'torch', 'torchvision', 'tensorrt', ])

.env( { 'SOME-RANDOM-SECRET-KEY': 'xxx-xyz-1234-abc-5678', } )

.run_custom_function( download_and_quantize_model, )


@tensorkube.entrypoint(image, gpu = 'A10G')

def load_model_on_gpu():

import transformers

model = transformers.BertModel.from_pretrained('bert-base-uncased')

model.to('cuda')

tensorkube.pass_reference(model, 'model')


@tensorkube.function(image)

def infer(input: str):

model = tensorkube.get_reference('model')

# test the model on input

response = model(input)

return response



Scale

Tensorfuse automatically scales in response to the amount of traffic your app receives.

Fast cold boots with our optimized container system

Scale

Tensorfuse automatically scales in response to the amount of traffic your app receives.

Fast cold boots with our optimized container system

Ease and speed of serverless.

Flexibility and control of your own infra.

Ease and speed of serverless.

Flexibility and control of your own infra.

Customize your environment

Describe container images and hardware specifications in simple Python. No YAML.

import tensorkube


image = tensorkube.Image.from_registry(

"nvidia/cuda" ).add_python(version='3.9')

.apt_install([ 'git','git-lfs' ])

.pip_install([ 'transformers', 'torch', 'torchvision', 'tensorrt', ])

.env( { 'SOME-RANDOM-SECRET-KEY': 'xxx-xyz-1234-abc-5678', } )

.run_custom_function( download_and_quantize_model, )


@tensorkube.use_image(image)

def infer():

print('Your inference code goes Here!')

Customize your environment

Describe container images and hardware specifications in simple Python. No YAML.

import tensorkube


image = tensorkube.Image.from_registry(

"nvidia/cuda" ).add_python(version='3.9')

.apt_install([ 'git','git-lfs' ])

.pip_install([ 'transformers', 'torch', 'torchvision', 'tensorrt', ])

.env( { 'SOME-RANDOM-SECRET-KEY': 'xxx-xyz-1234-abc-5678', } )

.run_custom_function( download_and_quantize_model, )


@tensorkube.use_image(image)

def infer():

print('Your inference code goes Here!')

Customize your environment

Describe container images and hardware specifications in simple Python. No YAML.

import tensorkube


image = tensorkube.Image.from_registry(

"nvidia/cuda" ).add_python(version='3.9')

.apt_install([ 'git','git-lfs' ])

.pip_install([ 'transformers', 'torch', 'torchvision', 'tensorrt', ])

.env( { 'SOME-RANDOM-SECRET-KEY': 'xxx-xyz-1234-abc-5678', } )

.run_custom_function( download_and_quantize_model, )


@tensorkube.use_image(image)

def infer():

print('Your inference code goes Here!')

Customize your environment

Describe container images and hardware specifications in simple Python. No YAML.

import tensorkube


image = tensorkube.Image.from_registry(

"nvidia/cuda" ).add_python(version='3.9')

.apt_install([ 'git','git-lfs' ])

.pip_install([ 'transformers', 'torch', 'torchvision', 'tensorrt', ])

.env( { 'SOME-RANDOM-SECRET-KEY': 'xxx-xyz-1234-abc-5678', } )

.run_custom_function( download_and_quantize_model, )


@tensorkube.use_image(image)

def infer():

print('Your inference code goes Here!')

Private by default

Your model and data live within your private cloud.

Private by default

Your model and data live within your private cloud.

Private by default

Your model and data live within your private cloud.

Private by default

Your model and data live within your private cloud.

Scale at will

Meet user demand in real time by scaling GPU workers from zero to hundreds in seconds.

Scale at will

Meet user demand in real time by scaling GPU workers from zero to hundreds in seconds.

Scale at will

Meet user demand in real time by scaling GPU workers from zero to hundreds in seconds.

Scale at will

Meet user demand in real time by scaling GPU workers from zero to hundreds in seconds.

Cost effective

Reduce egress charges by using model inference within your cloud environment.

Cost effective

Reduce egress charges by using model inference within your cloud environment.

Cost effective

Reduce egress charges by using model inference within your cloud environment.

Cost effective

Reduce egress charges by using model inference within your cloud environment.

OpenAI compatible

Start using your deployment on an OpenAI compatible endpoint.

OpenAI compatible

Start using your deployment on an OpenAI compatible endpoint.

OpenAI compatible

Start using your deployment on an OpenAI compatible endpoint.

OpenAI compatible

Start using your deployment on an OpenAI compatible endpoint.

Compute utilization

Easily utilize compute resources across multiple cloud providers.

Compute utilization

Easily utilize compute resources across multiple cloud providers.

Compute utilization

Easily utilize compute resources across multiple cloud providers.

Compute utilization

Easily utilize compute resources across multiple cloud providers.

Pricing for every team's size.

Pricing for every team's size.

Compute Management Costs

We charge compute management costs to make it possible for us to manage your infrastructure in a way that scales and is fair for various team sizes.

GPUs

$0.1 / GPU / hour

vCPUs

$0.007 / vCPU / hour

Compute Management Costs

We charge compute management costs to make it possible for us to manage your infrastructure in a way that scales and is fair for various team sizes.

GPUs

$0.1 / GPU / hour

vCPUs

$0.007 / vCPU / hour

Compute Management Costs

We charge compute management costs to make it possible for us to manage your infrastructure in a way that scales and is fair for various team sizes.

GPUs

$0.1 / GPU / hour

vCPUs

$0.007 / vCPU / hour

Free tier

$0

+ Compute Management Cost

1 Seat included

10 GPU hours / month free

Community Support

Free tier

$0

+ Compute Management Cost

1 Seat included

10 GPU hours / month free

Community Support

Free tier

$0

+ Compute Management Cost

1 Seat included

10 GPU hours / month free

Community Support

Free tier

$0

+ Compute Management Cost

1 Seat included

10 GPU hours / month free

Community Support

Team

$150

+ Compute Management Cost

10 Seats included

10 GPU hours / month free

Support via Private Slack

Team

$150

+ Compute Management Cost

10 Seats included

10 GPU hours / month free

Support via Private Slack

Team

$150

+ Compute Management Cost

10 Seats included

10 GPU hours / month free

Support via Private Slack

Team

$150

+ Compute Management Cost

10 Seats included

10 GPU hours / month free

Support via Private Slack

Enterprise

Custom

+ Compute Management Cost

Everything in the Team plan

Custom requirements

Tailored to your requirements

Enterprise

Custom

+ Compute Management Cost

Everything in the Team plan

Custom requirements

Tailored to your requirements

Enterprise

Custom

+ Compute Management Cost

Everything in the Team plan

Custom requirements

Tailored to your requirements

Enterprise

Custom

+ Compute Management Cost

Everything in the Team plan

Custom requirements

Tailored to your requirements

Get started with Tensorfuse today.

Get started with Tensorfuse today.

Deploy in minutes, scale in seconds.

import tensorkube


image = tensorkube.Image.from_registry(

"nvidia/cuda" ).add_python(version='3.9')

.apt_install([ 'git','git-lfs' ])

.pip_install([ 'transformers', 'torch', 'torchvision', 'tensorrt', ])

.env( { 'SOME-RANDOM-SECRET-KEY': 'xxx-xyz-1234-abc-5678', } )

.run_custom_function( download_and_quantize_model, )


@tensorkube.entrypoint(image, gpu = 'A10G')

def load_model_on_gpu():

import transformers

model = transformers.BertModel.from_pretrained('bert-base-uncased')

model.to('cuda')

tensorkube.pass_reference(model, 'model')


@tensorkube.function(image)

def infer(input: str):

model = tensorkube.get_reference('model')

# test the model on input

response = model(input)

return response



Get started with Tensorfuse today.

Deploy in minutes, scale in seconds.

import tensorkube


image = tensorkube.Image.from_registry(

"nvidia/cuda" ).add_python(version='3.9')

.apt_install([ 'git','git-lfs' ])

.pip_install([ 'transformers', 'torch', 'torchvision', 'tensorrt', ])

.env( { 'SOME-RANDOM-SECRET-KEY': 'xxx-xyz-1234-abc-5678', } )

.run_custom_function( download_and_quantize_model, )


@tensorkube.entrypoint(image, gpu = 'A10G')

def load_model_on_gpu():

import transformers

model = transformers.BertModel.from_pretrained('bert-base-uncased')

model.to('cuda')

tensorkube.pass_reference(model, 'model')


@tensorkube.function(image)

def infer(input: str):

model = tensorkube.get_reference('model')

# test the model on input

response = model(input)

return response



© 2024. All rights reserved.

Privacy Policy