P
ARALLEL
I
MPLEMENTATION
OF
H
.264
Steven Wirsz
V
IDEO
D
ECODING
CPU U
SAGE
M
OTION
E
STIMATION
I
NTERFACE
I
NTERFACE
D
ECODING
CPU U
SAGE
E
NCODING
CPU U
SAGE
P
ARALLEL
TASKS
T
ASKS
VS
S
LICES
PARALLELISM
Task Level or Data Level?
1.
Tasks
2.
Groups of Pictures
3.
B-frames
4.
Slices of each image
5.
Macro-Blocks / Motion Estimation
1. T
ASKS
P
ARALLEL
TASKS
Task Level Decomposition
Downsides:
•
CPU usage not well distributed due to
unpredictable data load
•
Similar to CPU pipelining: Limited
number of tasks – can only use a limited
number of cores
2. G
ROUP
P
ARALLEL
E
NCODING
F
RAME
T
YPES
GOP-Level Parallelism
Downsides:
•
Large shared memory requirements
•
Poor scalability: Limited number of
groups of frames that can be decoded in
parallel
•
Not completely coarse-grained: border
I-frames must be shared or redundantly
processed
3. Frame-Level Parallelism
Downsides:
•
Earlier MPEG didn’t, but H.264 utilizes
B-frames for reference. Compression
rate suffers if independently encoded
•
Scalability: usually no more than 1 to 3
B-frames between P-frames
4. S
LICES
Slice-Level Parallelism
Downsides:
•
no content from 1 slice can be used to
predict information within another slice
•
Must be encoded for exactly n
processors in order to be decoded by n
processors
•
Same degradation in compression rate
(64 slices increases bitrate roughly 34%)
M
OTION
E
STIMATION
5. Macroblock-Level Parallelism
•
# of independent MBs fluctuates widely
(complicated dependency rules)
•
Macro blockrows/Groups of MBs: less
fluctuation, greater memory requirement
•
Scalability grows with image resolution
Macroblock-Level Parallelism
Downsides:
•
Entropy decoding cannot be parallelized
because earlier stages must be
completed first (Amdahl’s Law 1/f)
•
Number of independent MBs doesn’t
remain constant (pseudo-bell curve)
Decoding: Macroblock Rows
Encoding: Bit Rate Loss
Speedup: Macroblock Rows