Particle system overview
Here you will find an overview of the high-level concepts of PopcornFX particles, with some definition of common terms that will be used throughout this wiki, such as particle mediums, evolvers, samplers, etc...
For an overview of the editor's interface, see Editor interface overview.
Contents
Birth
PopcornFX particles are added into the world through ![]() When editing, spawners are seen in the realtime editor as Particle layers, and they contain the whole particle definition.
|
![]() 8 Particle spawners/layers in the treeview |
Simulation
![]() Notable particle evolvers are the
|
![]() Displaying particle-medium bounds in the realtime editor viewport |
Rendering
Particles are rendered through ![]() A particle render medium can render one or more particle mediums. For example, if two particle layers have different evolve states, but share the same particle renderer (the settings are the same), then they will share the same render medium, and be rendered in a single geometry batch / draw call. |
![]() Mesh particle renderer |
Advanced
Particle scripts can use ![]() Notable particle samplers are the
Last but not least, Particle effect attributes can be defined. They are vector values global to a specific instance of a whole particle effect, and can be accessed from scripts anywhere inside any layer of the effect. Particle attributes are useful to allow gameplay code to control particle effects behavior on a per-instance scale. |
![]() Turbulence sampler, 120K particles |
Debugging & Optimization
The effect editor also has a few debugging and optimization tools, namely:
Although not intended to be used directly by fx-artists (see the 'For developers' section in the main wiki page):
|
![]() Particle-picking in the trace report mode |
Performance
"But scripting is slow, right?"
There are three things that make Popcorn scripts super efficient:
1. Streams
A script is not run once for every particle, but instead, it is run once for a large batch of particles.
Each operation will run in a massively parallel fashion and process a whole array of values in one go.
When you write something like a = cos(b * c);
, if 'b'
or 'c'
have a stream qualifier, the compiled script can be fed a large array of values, and that
simple line will expand into two streamed operations: a vectorized multiplication that performs all the multiplications for the whole batch, then a vectorized cosine that performs all the cosines on the whole batch in one go.
2. Compiler
The script compiler performs optimizations, and is able to fold expressions pretty aggressively. It knows which values vary per particle, which values do not change across a whole particle batch, which values do not change across a whole simulation frame, and which values do not change at all, ever.
It can rewrite and group together expressions based on where they do not change, and:
- precompute all the constant values at compile time
- perform some computations only once per batch
For example, let's say we have a curve sampler named "MyCurve", that's inside a layer (ie: cannot change at runtime), and we write the following:
Position = scene.axisForward() * (4*3 + pow(LifeRatio, 2) - 1.5) + scene.axisUp() * MyCurve.integrate(0, LifeRatio) / MyCurve.integrate(0, 1);
This is what the constant folding will produce:
Position = (LifeRatio * LifeRatio + 10.5).00x + (MyCurve.integrate(0, LifeRatio) * 3.478).0x0;
when the project is configured with a Z-up axis system, it will compile down to:
Position = (LifeRatio * LifeRatio + 10.5).0x0 + (MyCurve.integrate(0, LifeRatio) * 3.478).00x;
Here, 3.478 would be the inverse of the integral of that specific curve between 0 and 1. The compiler will see the function call can be precomputed, call it at compile time, and bake the results in with the other constants. It's allowed to do that only if the curve is local to the layer, therefore cannot change at runtime. Otherwise, if the curve was an attribute sampler, the optimizer couldn't make any assumptions, and leave the sample call in the final script, as the curve data could be overridden at runtime by the game code.
3. Optimized SIMD & GPU Backends
In addition to all that, the code that executes the script instructions is heavily optimized and hand-tuned, using each target platform CPU's specific SIMD instruction set.
Also, since v1.9.0, we have an experimental GPU backend that generates D3D11 GPU compute shaders, for even faster execution.
在这里您会对PopcornFX粒子的高层次概念,对一些贯穿于此维基版面的常用术语的定义,像粒子的媒体(mediums),进化器(evolvers),采集器(samplers)等等有个总体了解。
要想对编辑器界面有个总体概观,请参阅Editor界面概述。
产生
PopcornFX粒子是通过![]() 当进行编辑时,粒子生成器(spawners)在实时编辑器中是作为粒子层(Particle layers)的形式出现的,这些粒子层包含了完整粒子的定义。
|
![]() 处于树视图下的8个粒子生成器 / 粒子层 |
仿真
![]() 著名的粒子进化器包括
|
![]() 在实时编辑器视窗下显示粒子媒体的边界 |
渲染
粒子是通过![]() 粒子的渲染媒体可渲染一个或多个粒子媒体。 例如,假如两个粒子层拥有不同的进化状态,但却共享同一粒子渲染器(设置都相同),那它们就会共享相同的渲染媒体,且它们也会在单个几何批次 / 绘图调用中被渲染。 |
![]() 网格(Mesh)粒子渲染器 |
高级
粒子脚本可使用![]() 著名的粒子采集器包括
最后一项但同样很重要的就是可以定义粒子特效的属性(attributes)。 对整个粒子特效的某个具体实例而言这些属性都是全局向量值,且可在特效的任意粒子层内部的任何地方通过脚本来访问这些属性。 粒子属性(Particle attributes)对于让游戏设置代码来控制每个实例尺度上的粒子特效的表现是非常有帮助的。 |
![]() 扰动采集器,120K个粒子 |
调试和优化
特效编辑器也拥有一些调试和优化工具,即:
尽管我们没打算让特效师们直接使用(具体请参阅维基主页下的'适用开发者'部分),但我们还是提供了:
|
![]() P跟踪报告模式下的粒子拾取 |
性能
"不过,不得不承认脚本处理的确是慢!"
但如下三项举措的实施则使得Popcorn脚本拥有了超高执行效率:
1. 数据流(Streams)
程序不会为每个粒子都执行一次脚本,相反,却会为大批粒子执行一次脚本。
每项操作都将以并行方式进行大规模地执行,且会一次性处理整个值数组。
当您编写像a = cos(b * c);
这样的代码时,如果'b'
或'c'
使用了数据流限定符,那就可将编译好的脚本输送给大型值数组,且简单的线操作也会扩展成两项流操作:一项是向量化乘法,用来执行整个批次的所有乘法,还有一项是向量化余弦,其会在之后对整个批次上的所有余弦进行一次性执行。
2. 编译器(Compiler)
脚本编译器会执行优化,也能对表达进行非常积极的折叠。 编译器知晓哪些值会每经一个粒子就发生改变,哪些值在整个粒子批次自始至终都不会改变 ,哪些值在整个仿真帧自始至终都不会改变,还知晓哪些值从来都不会改变。
编译器可根据表达在何处不改变来重写它们,同时又对它们进行组合,还会:
- 在编译时(compile time)预先计算所有常量值
- 在每批次只执行一些计算指令
例如,假设我们使用了一个名为"MyCurve"的曲线采集器,其位于粒子层内部(也就是不能在运行时更改),我们编写如下代码:
Position = scene.axisForward() * (4*3 + pow(LifeRatio, 2) - 1.5) + scene.axisUp() * MyCurve.integrate(0, LifeRatio) / MyCurve.integrate(0, 1);
下面就是常量折叠所得的结果:
Position = (LifeRatio * LifeRatio + 10.5).00x + (MyCurve.integrate(0, LifeRatio) * 3.478).0x0;
当项目配置了Z轴朝上的坐标系时,上述表达就会被编译为:
Position = (LifeRatio * LifeRatio + 10.5).0x0 + (MyCurve.integrate(0, LifeRatio) * 3.478).00x;
其中,3.478就是特定曲线在0和1之间的积分的倒数。 编译器会查看可被预先计算的函数调用,并在运行时调用该函数调用,之后利用其它常量烘焙结果。 只有曲线对粒子层而言是本地的才被允许这样做,因此不能在运行时更改。 否则,如果曲线是一属性采集器,那优化程序就不能作出任何假设,也会将采集调用留在最终脚本中,其原因就是可通过游戏代码在运行时对曲线数据进行重写。
3. 经过优化的SIMD和GPU后端
除了上述谈到所有内容之外,我们还通过每个目标平台CPU的专用SIMD指令集对执行脚本指令的代码都做了高度优化和手工调优。
此外,为了实现极速执行,从v1.9.0版本开始,我们引入了可生成D3D11 GPU计算着色器的试验性GPU后端。