Deployment Options

Overview

The perfect cloud streaming platform would be one that is perfectly elastic, infinitely scalable and precisely schedulable. Those qualities are difficult to find in a single system. So rather than building a system that makes compromises between those qualities, the PureWeb Platform uses "Capacity Providers."

Different Capacity Providers excel in different ways: For time bound events with a large number of users, you may want to discuss setting up a “Dedicated” provider with your sales representative ahead of time. For ad-hoc and low-number situations our "On Demand" capacity may very well fit the bill.

Or, a hybrid approach might be the best option; with some Dedicated infrastructure provisioned, but overflow requests being served by On Demand resources.

See below for more information on each scenario.

Dedicated

Dedicated providers are just that, they’re bespoke clusters of infrastructure that our team will set up for you, dedicated exclusively to your models.

There are a lot of reasons why you might opt for a dedicated provider. For example if you are expecting a large (100+ concurrent sessions) influx of users in a known time frame, Dedicated providers allow you the option to reserve guaranteed capacity. This ensures you can have enough streaming capacity for all your users. In general, this provider type is a good fit when the usage patterns of an experience are known or predictable.

Additionally, Dedicated providers are more customizable than our On-demand offerings; you can configure them with a wider variety of GPUs, CPUs and memory allotments. You can specify both elastic and scheduled scaling behaviors. Dedicated providers can also be configured to run in any of our underlying cloud provider geographies, which can result in a better streaming experience depending on where your users are located. We can even work with you to provision your Dedicated instances with non-standard dependencies that may be required for your model.

Another benefit of Dedicated providers is that because these providers are configured exclusively for your model, our platform will pre-provision your model onto these instances. This means the time from launching a stream to getting your first frames of video will be significantly shorter on dedicated instances as opposed to On-Demand instances.

In terms of global routing and load balancing, if your project is configured to use Dedicated capacity only, then requests will be routed to the nearest region that has Dedicated capacity for models in that project.

Finally, it is important to note that because Dedicated server infrastructure is exclusive to you, usage is measured in terms of minutes of server up-time, regardless of how many users ended up streaming sessions from your dedicated servers.

On-Demand

Our On-Demand providers leverage pools of shared compute resources that can run any customer model in a securely isolated environment. The PureWeb Reality platform maintains these pools of servers in a subset of our cloud provider regions, and ensures that they are configured to scale up to meet demand as it comes into the system.

The primary benefit of using an On-Demand provider is that it eliminates the burden of having to predict or plan for capacity. Additionally, unlike Dedicated providers where you pay for your capacity (regardless of how much streaming may take place), with On-Demand providers, you only pay for what you use.

On-Demand providers are generally ideal for use cases where you want to run your model in an ad-hoc manner, or in a long-lived deployment that may have unpredictable usage over the lifetime of that deployment. If your project is configured to use On-Demand capacity only, the system will route users to the closest region with an On-Demand pool.

PureWeb’s On-Demand providers run across multiple clouds. Our platform ingests real-time data about latency and provider utilization, then makes routing choices that ensure stream consumers will wait as little as possible, for the highest performing stream.

Limitations of On-Demand

In order to create a secure, shared runtime environment for models running in the On-Demand system, the underlying implementation differs significantly from our Dedicated providers. This results in some limitations both in terms of infrastructure location, and what types of models can run in the system.

Any model that requires an active Windows desktop, or full Windows Server OS, can only be served from On-Demand providers based in North America.
DLSS is only supported from On-Demand providers based in North America.

Hybrid

If you want to leverage the benefits of both Dedicated and On-Demand providers, your project can be configured for a Hybrid deployment. This means that you can have some amount of dedicated compute capacity, and if the demand for your model exceeds that capacity, excess requests will flow into the nearest On-Demand provider.

Load balancing and routing in a Hybrid scenario is somewhat different than when using a single provider type:

If you’re only using Dedicated providers, then all users will be routed to the nearest (lowest network latency) region with free capacity. If all providers are at full capacity, requests will be sent to the nearest provider with the shortest queue.
If you’re only using an On-Demand provider, then all users will be routed to the nearest provider (lowest network latency).
Finally, if you’re using a Hybrid configuration, your Dedicated providers that have lower latency than on-demand providers and free capacity will always be given priority. In a scenario where there is free capacity for both dedicated and on-demand providers, and the end-user latencies are similar, preference is still given to the dedicated providers, even if the dedicated provider has slightly higher latency (this allows users to make the best use of their dedicated resources). If free capacity for dedicated providers has been exhausted, the best on-demand provider will be selected (this is computed based on latency, provider load).

Dedicated / On-Demand Comparison

	Dedicated	On Demand
Engine Compatibility	Unreal & Unity	Unreal & Unity
Available Regions	Any regions available within our cloud providers	North America - West Coast North America - East Coast UK Germany India Japan Australia
Capacity Guarantees	Available*	Best Effort
Instance Capabilities	GPU: Nvidia T4, A10G CPU: Variable Memory: Variable	Mixture of Nvidia T4 w/ 8 vCPUs & 32 GB of RAM and Nvidia RTX 5000s w/ 4 vCPUs & 24 GB of RAM
Sessions per resource	1+ depending on model optimization	1
Custom Dependencies	Available*	None
Scale	High (1000s of users/model, in each deployed region)	Moderate (<100 users/model, in each deployed region)
Start Time	Typically 5-30 seconds, depending on model optimization	Typically 5-90 seconds, depending on model optimization, launch frequency and region
Payment Schedule	Infrastructure up-time	User streaming time

* Denotes capabilities that carry an additional fee or alternative payment schedule.

What about launch times?

For Dedicated instances, assuming capacity is available, start times will range from 5-30 seconds. Approximately 3 to 6 seconds of this is platform orchestration and routing, the rest of the load time is dependent on how quickly your game starts up. Any optimizations made to reduce the start time of your game will improve overall launch time for your streaming experience

For On-Demand instances, assuming capacity is available, start times can range from 5-90 seconds. Due to infrastructure variances, on-demand launch times in North America are typically closer to the 5-15 second range.