Project File Details


Original Author (Copyright Owner): 

3,000.00

Instant Download

Download your project material immediately after online payment.

100% Money Back Guarantee

File Type: MS Word (DOC) & PDF
File Size: 2,094KB
Number of Pages:196

ABSTRACT

In today’s era of modern architectures, high performance computing has received a lot of
attention from diverse architectural levels. The performance crises in the computing arena have
forced researchers to look for alternative architectures that will foster high performance whiles
minimizing tradeoffs.
The Queue Core processor is a novel 32-bit microprocessor that emerged as a result of the urgent
desire for high performance microprocessor. The advent of the Queue Core processor has curb
the performance crises due to its interesting features such as high Instruction Level Parallelism
(ILP), dense program size, low power consumption, and elimination register concept.
However, the demand in further improving the performance of the Queue Core processor has
consequently triggered the growth in complexity. This has imposed a lot of constraints on the
evaluation of Queue Processor with regards to timing, computations etc. Understanding of the
internal dynamic mechanisms of the Queue Core Processor and their design space exploration
therefore rely extensively on simulation tools which are traditionally software.
In this research, we propose QSIM, a trace-driven and runtime simulator for the Queue Core
processor. In QSIM, only a subset of the Queue Core Instruction Set Architecture has been
implemented using the JAVA programming language.
It is interesting to mention that this research work falls in the cross-road of two different
domains: System Architecture and Software Engineering. The research will therefore be
accomplished using methodical approach through the competent background knowledge of the
Queue Core system architecture and application of Software Engineering principles.
The advents of the QSIM amongst other benefits will principally offer an attractive opportunity
to quickly evaluate the performance of the Queue Core Processor and serves as a tool to explore
more Queue computation.

TABLE OF CONTENTS

ACKNOWLEDGMENT iii
DEDICATION iv
CONTENTS v
LIST OF FIGURES ix
LIST OF APPENDIX xi
LIST OF ABBREVIATIONS xi
ABSTRACT xiii
CHAPTER 1
1.0 INTRODUCTION 1
1.1 Overview of Queue Computing 1
1.2 Overview of Queue Core Processor 4
1.2.1 Produced Order Queue Computation Model 4
1.2.2 Queue Core Architecture 7
1.3 Research Problem Statement 10
1.4 Research Goal 10
1.5 Research Objectives 10
1.6 Expected Output of Research 11
1.7 Overview of Simulators 11
VI
CHAPTER 2
2.0 Related Works 13
2.1 Overview of Works on Queue Core Processor 18
2.1.1 Queue Core Instruction Set Architecture 18
2.1.2 Queue Core Special Purpose Registers 20
2.1.3 Synthesis of QC-2 20
2.1.4 Queue Core Compiler Design 21
2.1.5 Queue Core Assembler 23
2.1.6 Current State of the Art of Queue Core Processor 24
CHAPTER 3
3.0 Simulator Design Infrastructure 25
3.1 Infrastructure of Instruction Processing Stages 26
3.2 Datapath Infrastructure of QC-2 32
3.3 Queue Register Infrastructure 35
3.4 Software Engineering Infrastructure of QSIM 36
3.4.1 UML Class Diagram 36
3.4.2 UML Package Diagram 37
3.4.3 Analysis of UML Package Diagrams 39
3.4.3.1 QSIM. Register Package 39
3.4.3.2 QSIM Package 40
VII
3.4.3.3 QSIM .Util Package 41
3.4.3.4 QSIM. Gui Package 41
3.4.4 Software Life Cycle Model 44
3.4.5 Development Environment and Language Infrastructure 46
CHAPTER 4
4.0 Results and Discussion 48
4.1 Benefits of QSIM 49
4.2 QSIM Features 49
4.2.1 QSIM Splash Screen 49
4.2.2 QSIM User Interface 50
4.2.3 Tools and Menu Bar Components 51
4.2.4 QSIM Registers Window 52
4.2.5 File Edit Window 54
4.3 QSIM Computation Procedure 54
4.4 QSIM Program Execution 54
4.4.1 QC-2 Assembly Program 54
4.4.2 Assembly Language Execution 54
4.4.3 QSIM Execution Statistics 61
VIII
CHAPTER 5
5.0 Conclusion and Future Works 63
5.1 Conclusions 63
5.2 Future Works on QSIM 63
REFERENCES 65
IX

CHAPTER ONE

1.0 INTRODUCTION
1.1 OVERVIEW OF QUEUE COMPUTING
Nowadays, the shift in Hardware and Software technology has compelled designers and users
to look at micro-architecture that process instructions stream with high performance, low
power consumption, and short program length. In striving for high performance, microarchitecture
researchers have emphasized Instruction-Level Parallelism (ILP) processing,
which has been established in superscalar architectures without major changes to Software.
Since the program contains no explicit information about available ILP, it must be discovered
by the Hardware, which must then also construct a plan of actions for exploiting parallelism.
In short, computers have far achieved this goal at the expense of tremendous Hardware
complexity – a complexity that has grown so large as to challenge the industry‟s ability to
deliver ever-higher performance.
Queue Computing emerged as a new paradigm with an attractive alternative seeking to
achieve the compulsive demand of high performance in micro-architectures.
Queue Computing in simpler terms is the use of the Queues in processing/computation of
data. Queue Computing uses the Queue Computation Model for its data processing.
It has received much attention in recent years as an alternative architecture due to interesting
features such as high parallelism capabilities, less instruction set, low complexity etc.
Queue Computation Model (QCM) refers to the evaluation of expressions using a First-In
First-Out (FIFO) Queue [9, 11], called Operand Queue. In this model, operands are inserted,
or Enqueued, at the Tail of the Queue and removed, or Dequeued, from the Head of the
Queue. Two references are needed to track the location of the Head and the Tail of the
Queue. The Queue Head (QH), points to the head of the Queue. And Queue Tail (QT),
points to the rear of the Queue.
The concept of Enqueueing and Dequeueing extends to the operations allowed by the QCM.
For example, binary and unary operations require two and one operands, respectively, to be
Dequeued. After the operation is performed, the result is Enqueued back to the FIFO queue
2
Development of Queue Assembler and Runtime Simulator: QSIM- Debut Version
as illustrated in the diagram below (a+b and (a+b)/c are Enqueued after ADD and DIV
operations respectively). We say that binary and unary operations Consume (Dequeued) and
Produce (Enqueue) data. Some type of operations is Produce-only such as Load operation.
Other types of operations are Consume-only such as Store operation. Queue Code is defined
as the set of instructions executed by the QCM.
Fig1 below shows the evaluation of the expression  
C
a  b using the Queue Computation
Model
3
Development of Queue Assembler and Runtime Simulator: QSIM- Debut Version
QUEUE COMPUTING
FIFO QUEUE
QT QH
IN / ENQUEUE OUT/ DEQUEUE
c
QT QH
FIFO QUEUE
DEQUEUE b
QT QH
FIFO QUEUE
ENQUEUE b b a
a
FIFO QUEUE
QT QH
ENQUEUE a
QT QH
FIFO QUEUE
ENQUEUE c c b a
ADD
QT QH
FIFO QUEUE
c b DEQUEUE a
ENQUEUE a+b
QT QH
FIFO QUEUE
a+b c
DIV
QT QH
FIFO QUEUE
DEQUEUE c, a+b
QT QH
FIFO QUEUE
ENQUEUE (a+b)/c a+b/c
Fig 1:QUEUE COMPUTING EVALUATION
4
Development of Queue Assembler and Runtime Simulator: QSIM- Debut Version
1.2 OVERVIEW OF THE QUEUE CORE PROCESSOR
The quest for high performance in traditional register architecture led to the development of
complex compiler and Hardware technique for the exploitation of Instruction Level Parallelism
(ILP) [6] in microprocessors. Instruction Level Parallelism is the key to improve the performance
of modern architectures. Several Hardware and compiler techniques have been proposed
including provision of more architectural registers, modulo scheduling, usage of register
windows among others in order to improve the performance of related architectures [5].
Upon the relentless efforts made by several proposed architectural techniques among those
above, performance was still a major setback in those architectures. This has accelerated the
urgent desire of microprocessor designers to seek for alternative architectures that can extract
maximum parallelism and foster high performance whiles minimizing tradeoffs.
The Queue Core Processor emerged as an alternative microprocessor to combat the performance
crises in the Computing world. It is an entirely new Processor based on the Producer Order
Queue Computation Model.
The key ideas of the Produced Order Queue Computation Model are the operands and results
manipulation schemes.
1.2.1 PRODUCED ORDER QUEUE COMPUTATION MODEL
The Produced Order Queue Computing Model uses a circular Queue Register (Operand Queue)
instead of Random Access Register to store intermediate results. Data is inserted in a Queue
Register (QREG) in a Produced Order Scheme and can be reused.
A special register, called Queue Head pointer (QH), points to the first data in the QREG.
Another pointer, named Queue Tail pointer (QT), points to the location of the QREG in which
the result is stored. Data Dequeued at QH are called Consume data whiles data Enqueued at the
QT are called Produced data. In the Queue Computation Model, Produced data can be reused
(given forth the name Produced Order Queue Computation Model).
Live Queue Head pointer (LQH) is also used to keep used data that could be reused and thus
should not be overridden. These data, which are found between QH and LQH pointers, are called
5
Development of Queue Assembler and Runtime Simulator: QSIM- Debut Version
Live-Data (data still reusable). The Live-Data entries in the QREG are statically controlled.
Dead-Entries signify those data that are no longer needed and can be overridden. Empty-Entries
contained no data.
Two special instructions are used to stop or release the LQH pointer. Stplqh (stop LQH)
instruction is implemented to stop the automatic movement of LQH whiles autlqh (automatic
LQH) instruction is used to release the LQH pointer.
Immediately after using the data, the QH is incremented so that it points to the data for the next
instruction. QT is also incremented after the result is stored.
The diagram below show the structure of the Circular Queue Register (QREG) showing the QH,
QT, LQH, Live Entries, Dead Entries.
FIG 2:CIRCULAR QUEUE REGISTER STRUCTURE
6
Development of Queue Assembler and Runtime Simulator: QSIM- Debut Version
The Queue Core Processor offers attractive features like reduction in code size since operands
are implicitly specified, the concept of registers does not exist here, hence eliminating register
renaming and avoiding false dependencies.
The extraction of maximum Instruction Level Parallelism is a key feature since its programs
are constructed from Level-order Traversal. It has small instruction window and consumes
less power. These features are depicted below:
EXPRESSION X= (a+b)/ (c-d)
Low power
Small instruction window
LOAD a
LOAD b
LOAD c
LOAD d
ADD
SUB
DIV
STORE X
QUEUE PROGRAM
No register renaming
No false dependencies
L 0
L 1
L 3
L 2
High Performance
X
/
a b
+
c d

LEVEL-ORDER TRAVERSAL
Maximum parallelism
FIG 3: QUEUE EXPRESSION EVALUATION
7
Development of Queue Assembler and Runtime Simulator: QSIM- Debut Version
1.2.2 QUEUE CORE ARCHITECTURE
The QC-2 Processor is a 32-bit microprocessor which implements a Produced Order Instruction
Set architecture. All instructions are 16-bit wide and each instruction can encode at most two (2)
operands specifying the location in the operand queue from where to read the operands.
Below is the Queue Core Processor architecture block diagram showing clearly all the core
modules and how they are interconnected. The block diagram primarily contains the Queue Core
Memory, Instruction Pipeline Units, Execution Units and Architectural registers (Circular Queue
Registers).
Source bus
Load To QREG
PROG/DATA
MEMORY
Fetch 4 x16 bit inst/cycle
4 INST
4 INST
4 INST
INSTRUCTION FETCH UNIT (FU)
BARRIER AND ISSUE UNITS
INSTRUCTION DECODE UNIT (DU)
QUEUE COMPUATION UNIT (QCU)
EXECUTION UNIT
Architectural Registers
(CIRCULAR QUEUE
REGISTER)
Data bus
LD/ST (2 FU)
Ld-St/ from/to MEM
SET (4 FU)
AL U (4
FU)
FPU (2 FU)
MUL (1
FU)
BRAN (1 FU)
Renew
QH, QT,
LQH
FIG 4:QUEUE CORE PROCESSOR ARCHITECTURE
8
Development of Queue Assembler and Runtime Simulator: QSIM- Debut Version
The Queue Core Processor has six Pipeline Stages combined with five pipeline-buffers to
smoothen the flow of instructions through the pipeline [3].
Fetch Unit (FU): The pipeline begins with the Fetch Stage which fetches instruction from the
Instruction Memory and delivers four instructions to the Decode unit each cycle.
Decode Unit (DU): The DU decodes four instructions in parallel during the second phase and
writes them into the Decode buffer. This stage also calculates the number of Consumed (CNBR)
and Produced (PNBR) data for each instruction. The CNBR and PNBR are used by the next
pipeline stage to calculate the sources (source1 and source2) and destination locations for each
instruction.
Queue Computation Unit (QCU): The QCU is a unique computation Unit that distinguishes the
Queue Core Processor‟s pipeline stages from registers based Processors. Four instructions arrive
at the QCU unit each cycle. The QCU calculates the first operand (source1) and destination
addresses (DEST) for each instruction. It is interesting to mention that these addresses are
calculated dynamically by the Hardware mechanism in the QCU.
The figure below shows the Hardware mechanism for source1 address calculation by the QCU:
.
PNBR
NQH
CNBR
QH0
QH1
CNBR
QHn+1
PNBR
QT1
QTn+1
NQT
QT0
DESCRIPTION
PNBR: Number of Produced data
CNBR: Number of Consumed data
QH0: Initial Queue Head Value
QT0: Initial Queue Tail Value
NQH: Next Queue Head Value
NQT: Next Queue Tail Value
QHn+1: Next Queue Head Value
QTn+1: Next Queue Head Value
FIG 5: SOURCE 1 ADDRESS HARDWARE
9
Development of Queue Assembler and Runtime Simulator: QSIM- Debut Version
Barrier and Issue Units: These are other special units that characterize the Queue Core
Processor from others. The Barrier Unit inserts barrier flags for dependency resolutions.
The Issue Unit (IU) on the other hand issues four (4) instructions for execution each cycle. In this
stage, the second operand (source2) of a given instruction is first calculated by adding the
address of source1 to the displacement that comes with the instruction. The second operand
address calculation is performed in the QCU stage. However, for a balanced pipeline
consideration, the source2 is calculated at the beginning of the Issue Stage.
The Hardware mechanism for the source2 address calculation is shown below:
The Processor reads the operands from the Queue register, and execution begins in the next
stage.
Execution Unit (EU): The macro data flow execution core consists of four Integer ALU units,
two Floating-point units, one Branch unit, one Multiply unit, four Set units, and two Load/Store
units. These units are shown in the QC-2 block diagram above (Fig 4). The results from the
execution phase is either stored in Memory or written into the Queue Register.
It is worth mentioning that the Circular Queue register is one of the most distinguishing features
of the QC-2 from other Processor architectures. It totally eliminates the need for random access
registers hence eliminating false dependencies and register renaming.
QHn-1 SRC 1(n-1)
QTn-1 DEST (n-1)
OFFSET (n-1)
SRC 2(n-1)
QTn DEST (n)
SRC 2(n)
QHn SRC 1(n)
OFFSET (n)
DESCRIPTION
OFFSET: +/- Integer value that indicate the location of SRC2 (n-1) from QH (n-1)
QT (n): Queue Tail value of instruction n, DEST (n): Destination location of instruction n
SRC 1: Source address 1 of instruction (n-1), SRC 2: Source address 2 of instruction (n-1)
FIG 6: SOURCE 2 ADDRESS HARDWARE
10
Development of Queue Assembler and Runtime Simulator: QSIM- Debut Version
1.3 RESEARCH PROBLEM STATEMENT
Queue Core Processor has demonstrated high performance against conventional microprocessors
(RISC & CISC).
As complexity of QC-2 increases due to the desire to further enhance the performance and
throughput, there is the apparent need to explore the QC-2 and incorporate other features. The
dynamics and characteristics of QC-2 will have to be studied based on the new added features. It
becomes very pertinent to execute and debug Queue programs in order to evaluate Queue
computations. The performance of the Queue Processor needs to be evaluated.
Undoubtedly, the actual QC-2 Hardware will not be the best platform to evaluate these features.
Hence, we need a virtual system to implement the above features and functionalities before
affecting them on the actual Hardware.
Many modern Processor architectures unlike the Queue Core Processor have simulators that will
emulate them and render these functionalities.
This thesis was therefore inspired by the need to design a simulator that will emulate the Queue
Core Processor.
1.4 RESEARCH GOAL
The goal of this research is to design Runtime Simulator for the Queue Core Processor.
1.5 RESEARCH OBJECTIVES
This research seeks to achieve the following objectives
To study and understand the general underlying principles of simulators, and their
benefits most especially in the design of microprocessors
To study and understand the internal dynamic mechanism of the QC-2 microprocessor
To model the system architecture of the QC-2 microprocessor
Design the conceptual model of the QC-2 simulator (QSIM)
Based on the conceptual design, build a simulator that will emulate the QC-2
microprocessor and implement a subset of the QC-2 Instruction Set Architecture
11
Development of Queue Assembler and Runtime Simulator: QSIM- Debut Version
1.6 EXPECTED OUTPUT OF THIS RESEARCH
When this research is completed successfully, the following are expected:
QSIM
would be designed for the QC-2 Processor
would be able to load and execute Assembly Language programs for the QC-2 Processor
would provide the enabling platform for studying the interactions between the
components of the QC-2 Processor
would offer an attractive opportunity to quickly evaluate the performance of the Queue
Processor
would serve as a tool to explore more Queue computation
1.7 OVERVIEW OF SIMULATORS
Simulation has become a useful and integral part in the modeling of many natural systems in a
wide range of fields including Manufacturing, Education, Health, Technology etc. By virtue of
the diverse use of the term, it has been defined in many contexts differently.
In [18], simulation has been termed as the imitation of some real thing, state of affairs or process.
According to [21], it is the artificial creation of condition in order to study or experiment
something that could exist in reality.
Simulation has been viewed in [20, 22] as the mathematical representation of interaction between
real-world object.
In spite of the varied use of the terminology, it can be coherently viewed as the act of depicting
the structure and functionality of a real system using a virtual system.
The expanding role played by simulation in different arena of life has triggered the desire of
computer scientist to embrace the concept. Computer simulation has therefore been defined as
the use of a computer-generated system to represent the dynamic responses, structure and
behavior of a real proposed system.
12
Development of Queue Assembler and Runtime Simulator: QSIM- Debut Version
In the Computing paradigm, simulation becomes indispensable and desirable when we need to
explore and gain insight into new technology and to estimate the performance of systems too
complex for analytical solutions. It is used to show eventual effect of alternative conditions and
course of actions. More importantly, when the real system cannot be engaged, simulation
becomes the ultimate choice.
Simulator can be described as the tool that provides the needed platform for simulation.
Computer Simulator stemmed from the term computer simulation and can be viewed as a piece
of Software to model computer device or predict outputs and performance metrics on a given
input.
Computer simulator provides the opportunity to make a virtual representation of the reality and
makes a way to study the dynamics and behavior of this virtual system which depicts the real
system.
The tool has attracted wide range of use in the computer science domain as complexity in
modern architectures rises due to new technological trend. In order to ride along with the current
trend, microprocessor designers have adopted the entire concept. The term emulator has been
used to describe the computer simulators that simulate microprocessors. The emulator enables
designers to evaluate complex Hardware designs without building. It helps to access non-existent
components or systems. Detailed performance metrics can be obtained through this tool. More
importantly, it allows the debugging of micro-programs. The dynamics and characteristics of a
proposed system can be explored in details. Because they are implemented in Software, not
silicon, they can be easily modified to add new instructions, build new systems such as
multiprocessors.
In respect of this, many simulators have been built over the years for different microprocessors.
These simulators among other benefits are able to read instructions of a particular
microprocessor and execute those instruction based on the internal mechanisms of such
simulators.

GET THE FULL WORK

DISCLAIMER: All project works, files and documents posted on this website, projects.ng are the property/copyright of their respective owners. They are for research reference/guidance purposes only and the works are crowd-sourced. Please don’t submit someone’s work as your own to avoid plagiarism and its consequences. Most of the project works are provided by the schools' libraries to help in guiding students on their research. Use it as a guidance purpose only and not copy the work word for word (verbatim). If you see your work posted here, and you want it to be removed/credited, please call us on +2348157165603 or send us a mail together with the web address link to the work, to hello@projects.ng. We will reply to and honor every request. Please notice it may take up to 24 or 48 hours to process your request.