Overview

The perfect cloud streaming platform would be one that is perfectly elastic, infinitely scalable and precisely schedulable. Those qualities are difficult to find in a single system. So rather than building a system that makes compromises between those qualities, the PureWeb Platform uses "Capacity Providers."

Different Capacity Providers excel in different ways: For time bound events with a large number of users, you may want to discuss setting up a “Dedicated” provider with your sales representative ahead of time. For ad-hoc and low-number situations our "On Demand" capacity may very well fit the bill.

Or, a hybrid approach might be the best option; with some Dedicated infrastructure provisioned, but overflow requests being served by On Demand resources.

See below for more information on each scenario.

Dedicated

Dedicated providers are just that, they’re bespoke clusters of infrastructure that our team will set up for you, dedicated exclusively to your models.

There are a lot of reasons why you might opt for a dedicated provider. For example if you are expecting a large (100+ concurrent sessions) influx of users in a known time frame, Dedicated providers allow you the option to reserve guaranteed capacity. This ensures you can have enough streaming capacity for all your users. In general, this provider type is a good fit when the usage patterns of an experience are known or predictable.

Additionally, Dedicated providers are more customizable than our On-demand offerings; you can configure them with a wider variety of GPUs, CPUs and memory allotments. You can also specify both elastic and scheduled scaling behaviors. Dedicated providers can also be configured to run in any of our underlying cloud provider geographies, which can result in a better streaming experience depending on where your users are located. We can even work with you to provision your Dedicated instances with non-standard dependencies that may be required for your model.

Another nice benefit of Dedicated providers is that because these providers are configured exclusively for your model, our platform will pre-provision your model onto these instances.  This means the time from launching a stream to getting your first frames of video will be significantly shorter on dedicated instances as opposed to On-Demand instances.

In terms of global routing and load balancing, if your project is configured to use Dedicated capacity only, then requests will be routed to the nearest region that has Dedicated capacity for models in that project.

Finally, it is important to note that because Dedicated server infrastructure is exclusive to you, usage is measured in terms of minutes of server up-time, regardless of how many users ended up streaming sessions from your dedicated servers.

On-Demand

Our On-demand providers leverage pools of shared compute resources that can run any customer model in a securely isolated environment. The PureWeb Reality platform maintains these pools of servers in a subset of our cloud provider regions, and ensures that they are configured to scale up to meet demand as it comes into the system.

The primary benefit of using an On-demand provider is that it eliminates the burden of having to predict or plan for capacity. Additionally, unlike Dedicated providers where you pay for your capacity (regardless of how much streaming may take place), with On-demand providers, you only pay for what you use.

On-Demand providers are generally ideal for use cases where you want to run your model in an ad-hoc manner, or in a long-lived deployment that may have unpredictable usage over the lifetime of that deployment. If your project is configured to use On-Demand capacity only, the system will route users to the closest region with an On-Demand pool.

Current Limitations of On-Demand

In order to create a secure, shared runtime environment for models running in the On-Demand system, the underlying implementation differs significantly from our Dedicated providers. This results in some limitations about what types of models can run in the system.

  1. On-Demand capacity only supports Unreal packages at this time.
  2. Any model that requires an active Windows desktop, or full Windows Server OS, will not work with an On-Demand provider.
  3. Any model that tries to launch an external process or executable will not be able to do so inside the container environment.
  4. DLSS and in-game video playback do not currently work in On-demand.

A major priority for the development team is to start to address these gaps. In fact, by Q4 of 2022 we anticipate that we will be able to eliminate all of the above limitations for On-Demand workloads running in North America.

Hybrid

If you want to leverage the benefits of both Dedicated and On-Demand providers, your project can be configured for a Hybrid deployment. This means that you can have some amount of dedicated compute capacity, and if the demand for your model exceeds that capacity, excess requests will flow into the nearest On-Demand provider.

Load balancing and routing in a Hybrid scenario is somewhat different than when using a single provider type:

  • If you’re only using a Dedicated provider, then all users will be routed to the nearest (lowest network latency) region. If you have multiple Dedicated clusters within a region, users will be randomly assigned among the available options.
  • If you’re only using an On-Demand provider, then all users will be routed to the nearest (lowest network latency) region.
  • Finally, if you’re using a Hybrid configuration, users will stay within predefined regions, to avoid high latency connections. For example, a request from North America would not be connected to a server in Asia or Europe; the request would always be fulfilled by resources within North America, with the Dedicated provider being selected first, until no capacity was available, then users would be routed to an On-demand provider within the region.

Dedicated / On-Demand Comparison

  Dedicated On Demand
Engine Compatibility Unreal & Unity Unreal
Available Regions Any regions available within our cloud providers West & East North America
West & Central Europe
South East Oceana
Capacity Guarantees Available* Best Effort
Instance Capabilities GPU: Nvidia T4, A10G
Coming soon - RTX 4000 / 5000 & A4000 / A5000 / A6000

CPU: Variable
Memory: Variable
Mixture of Nvidia T4 w/ 8 vCPUs & 32 GB of RAM and Nvidia RTX 5000s w/ 4 vCPUs & 24 GB of RAM
Sessions per resource 1+ depending on model optimization 1
Custom Dependencies Available* None
Scale High
(1000s of users/model, in each deployed region)
Moderate
(<100 users/model, in each deployed region)
Compatibility Unreal / Unity Unreal
Scale High (1000s of concurrent users per model, per region) No
Start Time Typically 5-30 seconds, depending on model optimization Typically 5-90 seconds, depending on model optimization, launch frequency and region
Payment Schedule Infrastructure up-time User streaming time

* Denotes capabilities that carry an additional fee or alternative payment schedule.

What about launch times?

For Dedicated instances, assuming capacity is available, start times will range from 5-30 seconds. Approximately 3 to 6 seconds of this is platform orchestration and routing, the rest of the load time is dependent on how quickly your game starts up. Any optimizations made to reduce the start time of your game will improve overall launch time for your streaming experience

For On-Demand instances, assuming capacity is available, start times can range from 5-90 seconds. There are several factors that determine what the launch time for a given model will be:

  • As with Dedicated instances, part of the launch time is dependent on how quickly your model starts up. This is something you can optimize for in your model development.
  • Model size: in order to run securely in a shared-tenant environment, models are provisioned dynamically at launch time.  This involves copying and unzipping your model. Larger models take longer to copy and decompress. A general guideline is: every gigabyte in the size of your model will add approximately 15 to 20 seconds to the launch time in the on-demand system.
  • How recently your model has been launched on a given shared instance. On-Demand instances have a mechanism for caching recently launched models. If a model is cached, it does not need to be downloaded or unzipped again. Unfortunately, if your model is not being accessed frequently, these longer load times might occur in the majority of launches, because the model does not remain cached.

Please note that decreasing the launch time for our On-demand providers is currently a key focus for our product team. As mentioned above, in Q4 of 2022, we’ll be releasing a major overhaul to our On-demand providers in North America. These new On-Demand providers will have none of the compatibility issues associated with the current providers, and the launch times will be equivalent to what you’d see from a Dedicated provider.