design of memory efficient vlsi architecture for real time multimedia

S.Allwin Devaraj et al., International Journal of Computer & IT [ISSN No.(Print):2320-8074]
DESIGN OF MEMORY EFFICIENT VLSI
ARCHITECTURE FOR REAL TIME MULTIMEDIA
APPLICATION
1
S.Allwin Devaraj, babu.allwin@gmail.com
Department of ECE, Assistant Professor
2
R.Helen Vedanayagi Anita, rhelenanita@gmail.com
Department. of ECE, UG Student
Francis Xavier College of Engineering,Tirunelveli
ABSTRACT:- On-chip memory hierarchy for a
video, contains the data memory and the context
memory organizations for better optimization.
Compressing the memory space is important
aspect in VLSI, in order to reduce the power
consumption, power dissipation and area.HCC
and LDO techniques are used to compress the onchip memory space and thereby reduces the
reconfiguration time and the data reference time.
HCC, the contexts to be constructed in
hierarchical fashion in order to eliminate
repetitive portions of the contexts. In LDO, it
increases the reuse ratio of memory space
automatically and also for an several time
references. The hierarchical storage & re usage for
saving required memory space , without affecting
the performance .In future by increasing the size
of macro blocks from 8X8 into 64x64 in H.265
CODEC to achieve better compression rate and to
increase throughput rate using pipelining
technique.
Keywords: HCC, LDO, CODEC, On-chip memory,
Macro blocks
I. INTRODUCTION
Multimedia technology, improved the quality of
human lives from large device to portable devices,
from home to outdoors ,and from specific people to
everybody. Digital video technology is considered to
be the most important part of multimedia providing
applications such as high definition TV, 3Dgraphics,
digital cinema, camera, and so on. The multimedia
system used to store and/or transmit video data
becomes an essential concern. The key role of a video
system to reduce the video data without losing any of
its quality through an video CODEC(encoder and
decoder),or known as video coding.
© 2015, IJCIT All Rights Reserved
Video consumes 66% of the total Internet data flow,
and that number continues to increase rapidly. Users
want to watch high quality videos but, for an online
video providers, costs of purchasing the network
bandwidth and the storage devices grow increasingly
every year.[9]
Compressing video reduces its file size, it is
important, because smaller files upload faster and
thus it save bandwidth and storage costs, and load
quicker when it played back . Video is composed of a
series of still frames normally 24-60 frames per
second and only part of the image changes from
frame to frame. Instead of storing two nearly-identical
frames in a video file, only the parts of the image to
be changed are recorded. For eg. If you have a friend
waving to the camera, and your friend's arm is the
only thing moving in the shot, and thus the image
information of your friend's arm is the only thing
recorded. The method a computer used for
determining the amount of change between frames is
called the codec.
For every couple of frames, the codec will
pick a key frame that to be serves as a reference for
all the frames after it. H.261and H.263 are now
widely used for real-time video communications in a
network.H.264 is now a widely adopted standard and
its an international standard for video compression.
H.264 provides significant improvements in coding
efficiency, latency, complexity and robustness. H.264
format is more broadly available in network cameras,
video management and video encoders software,
system designers and integrators will need to make
sure that the products and vendors they choose and
support this new open standard.
H.264 is a standard for video compression,
and now it is currently one of the most commonly
used format for recording, compression and
distribution of high definition video. In this re usage
P a g e | 30
S.Allwin Devaraj et al., International Journal of Computer & IT [ISSN No.(Print):2320-8074]
scheme and hierarchical storage are used to the save
of memory space required , without affecting the
performance.[1] In LDO, the on-chip data are
classified into two types, based on the lifetime of
data. The short-lifetime of data are stored in the
FIFO to increase the reuse ratio of memory space
automatically and can be used to pass the value from
one stage to next. And RAM for storing the long life
time data . Inorder to perform inter & intra prediction,
the previous frames and blocks need to be stored for
reference. A H.264 video encoder which carries out
prediction, DCT transform and encoding processes to
produce a compressed H.264 bit stream, H.264 video
decoder carries out the complementary processes of
decoding, inverse transform and reconstruction to
produce a decoded video sequence. The HCC and
LDO techniques used in H.264 CODEC uses the
logic element of 6869 , speed of 153.12MHZ and the
dissipate power of 514.47 mw. And in proposed
HEVC is said to double the data compression ratio
compared to H.264 at the same level of video quality.
And it can alternatively be used to provide improved
video quality at the same bit rate. It can support 8K
UHD and resolution up to 8192x4320.
I. SYSTEM DESIGN
Operation
The video is composed of series of still frames.DCT
transform is performed to first frame of video after
transformation it stored as reference in RAM and then
the next frame is get processed for DCT and then the
difference between the frames is get predicted by
motion estimation i.e the reference between the
different types of frame are realized by a process
called motion estimation. Motion estimation used to
estimates the residual value obtained from DCT and
motion compensation predicts which prediction is
performed to frames. The use of deblocking filter is to
improve the visual quality and prediction
performance by smoothening sharp edges between the
macro block. IDCT is performed to recover the
original frame. In this also HCC and LDO technique
used in H.265 CODEC thereby achieving the
reduction in area and dissipation of power and
increases speed when compared to H.264.
© 2015, IJCIT All Rights Reserved
INPUT VIDEO
VIDEO TO FRAME CONVERTER
FRAME 1
FRAME 2
APPLY DCT
APPLY DCT
REFERENCE
FRAME
MOTION
ESTIMATION
COMPARE
PERFORM INTER OR INTRA PREDICTION
IDCT
DEBLOCKING FILTER
RECOVERED FRAME
RAM(ON CHIP MEMORY
)
Fig 1: Flow diagram of proposed
II. DETAILED SYSTEM DESIGN
Context memory : The context memories store the
contexts, and get accessed during the configuration.
The context memory organization is crucial in the
reconfigurable system, because affects the size of
contexts and reflects the configuration.[1] The
context size determines both the reconfiguration
overhead and the silicon area. The smaller the
contexts, and it smaller the memory space required
for the contexts. Smaller the size of contexts assist in
reducing the transfer delay of contexts from off-chip
memory to on-chip memory. Overhead in the
communication heavily depends on the context
memory organization.
P a g e | 31
S.Allwin Devaraj et al., International Journal of Computer & IT [ISSN No.(Print):2320-8074]
the current macro block or block and the result of the
residual is compressed and transmitted to the decoder,
together with the information required for the decoder
to repeat the prediction process. The decoder creates
an identical prediction and adds this to the decoded
residual or block.
CONTEX
T
MEMOR
Y
on chip
DCT
CURRENT
TRANSFOR
M
FRAM
E
REFERENCE
FRAM
E
INTRA &
INTER
PREDICTIO
N
DEBLOCKING
FIF
O
FILTE
R
off chip
MOTION
COMPENSATION
on chip
MOTION
ESTIMATIO
N
Inter prediction : It aims to remove temporal
redundancies in a video sequence. Inter prediction
macro blocks must reside in P-slices and require an
history of previously encode frames and it to be kept
in memory. The availability of multiple reference
frames for motion compensation is a new feature in
H.264/AVC standard. [9]For inter prediction a 16x16
macro block to be partitioned into any 4x4 multiple.
Hierarchial configuration context: In nonhierarchical, the context get stored repeatedly and it
directly included in the task context. The main
advantage of HCC is to compress the memory space
required for the contexts. The context are get
constructed in hierarchical fashion in order to
completely eliminate the repetitive portions of the
context and it can be accessed and located
conveniently.
Fig 2: Block diagram
III. RESULTS& DISCUSSION
On-chip memory : On-chip memory is the simplest
type of memory to use in an FPGA-based embedded
system. And it provides the highest throughput,
lowest latency memory to be possible in an FPGAbased embedded system. Advantage of on-chip
memory requires no additional board space or circuitboard wiring because it is implemented directly on
the FPGA. On-chip memory can also saves
development time and cost.
Motion estimation : The references between the
different types of frames are get realized by a process
called motion compensation or motion estimation
.The correlation in terms of motion between two
frames is get represented by a motion vector. The
frame correlation ,and the pixel arithmetic difference
is strongly depends on how the good estimation
algorithm is implemented. Good estimation results in
better quality of the coded video sequence and higher
compression ratio .
Intra prediction : When a block or macro blocks is
coded in intra mode, a prediction block is formed
based on previously encoded and reconstructed blocks
in the same frame. This prediction is subtracted from
© 2015, IJCIT All Rights Reserved
MODELSIM OUTPUT:
Fig 3 shows the simulation output waveform after
processing the frames by performing DCT and
prediction process.
AREA UTILIZATION REPORT: The flow summary
depicts the successful compilation and execution. The
P a g e | 32
S.Allwin Devaraj et al., International Journal of Computer & IT [ISSN No.(Print):2320-8074]
report gives register usage and memory storage of a
system chip design.
Fig 4 the area can be calculated by knowing the total
logic elements and register, memory bits and total
pins.
PERFORMANCE REPORT:
Fig 6 Power play analysis tool allow estimating static
and dynamic power consumption throughout the
design cycle using Quartus software .
IV. COMPARISON RESULTS
TYPES
USED
AREA
SPEED
POWER
Existing
CODEC
6869
153.12MHZ
514.4 mW
Proposed
CODEC
6037
147.73MHZ
477.8 mW
Table shows the trade off analyzes of Video codec
with LDO and HCC methods with QUARTUS II
hardware synthesis using STRATIX III family.
V. CONCLUSION AND FUTURE
WORK
Fig5 shows the speed of the codec using Quartus II
software and the speed value obtained is 147.73
MHZ.
POWER ANALYZER:
© 2015, IJCIT All Rights Reserved
The proposed HCC and LDO techniques are
used to compress the memory space and reduce the
reconfiguration time using H.265 CODEC.The DCT
block , FIFO and RAM for data references are
designed thereby reducing On-Chip data memorysize
without affecting the performance of system and
thereby achieving the area,power and performance
than by using H.264.Future enhancements includes,
modify DCT blocks, prediction blocks and to carried
out pipelining for better throughput and performance
P a g e | 33
S.Allwin Devaraj et al., International Journal of Computer & IT [ISSN No.(Print):2320-8074]
metric analyzes
implementations.
VI.
in
H.265
and
its
FPGA
REFERENCES
[1] Yansheng Wang, Leibo Liu, Shouyi Yin , “On-Chip
Memory Hierarchy in OneCoarseGrained Reconfigurable
Architecture to Compress Memory Space and to
ReduceReconfiguration
Time
and
Data-Reference
Time”,IEEE
transactions
on
very
large
scaleintegrationsystems, vol. 22, no. 5, may 2014 983.
[2] T. Geng, L. Liu, S. Yin, M. Zhu, and S. Wei,
“Parallelization of computing-intensive tasks of the H.264
high profile decoding algorithm on a reconfigurable
multimedia system,” IEICE Trans. Inf. Syst.,vol. E93-D, no.
12, pp. 3223–3231, Jan. 2010.
[3] B. Mei, B. De Sutter, T. V. Aa, M. Wouters, and S.
Dupont“Implementation of a coarse-grained reconfigurable
media processor for AVC decoder,” J. Signal Process. Syst.,
vol. 51, no. 3, pp. 225–243, 2008.
© 2015, IJCIT All Rights Reserved
[4] J. Shield, P. Sutton, and P. Machanick,“Dynamic cache
switching in reconfigurable embedded systems,”, in Proc.
Int. Conf. Field Program Logic Appl., 2007, pp. 111–116.
[5] M. K. A. Ganesan, S. Singh, F. May, and J. Becker,
“H.264 decoder at HD resolution on a coarse grain
dynamically reconfigurable architecture,”in Proc. Int. Conf.
Field Program. Logic Appl., Aug. 2007,pp. 467–471.
[6] M. Suzuki, Y. Hasegawa, V. M. Tuan, S. Abe, and H.
Amano ,“A cost-effective context memory structure for
dynamically reconfigurable processors,” in Proc. 20th Int.
Parallel Distrib. Process. Symp., Apr. 2006, p. 188.
[7] White Paper of Reconfiguration on XPP-III Processor,
PACT Inc., Lisle,IL, USA, 2006.
[8] B. Mei, F-J. Veredas, and B. Masschelein, “Mapping an
H.264/AVCdecoder onto the ADRES reconfigurable
architecture” in Proc. Int. Conf.Field Program. Logic Appl.,
Aug. 2005, pp. 622–625).
[9] Suneetha Kosaraju , ”Novel VLSI Architecture for
Quantization and Variable Length Coding for H264/AVC Video Compression Standard”, theses,2005.
P a g e | 34