Mohammad Hossein Samavatian - Department of Computer

Mohammad Hossein Samavatian
Computer Engineering Department, Sharif University Of Technology, Azadi Ave, Tehran, Iran
(+98) 9183193525 (+98)9385631523 (+98)(8138224485)
Samavartian@ce.sharif.edu
Mh.samavatian@gmail.com
Single Iranian
OBJECTIVE
 Seeking PhD position.
EDUCATION
MSc of Computer Engineering - Computer Architecture
2011 to 2013
Sharif University of Technology
GPA: 3.79
Computer Engineering - Hardware
2007 to 2011
Amirkabir University of Technology (Tehran Polytechnic)
GPA: 3.33 (last two years: 3.73, last Three years: 3.73)
High school diploma in Mathematics and Physics at school of Exceptional Talents NODET
2003 to 2007
AllameHelli of Hamedan
INTERESTS





GPGPU, Multi-cores and Many-cores Architecture and Programming.
High Performance Systems Architecture.
Interconnection Networks.
Embedded System Design.
Quantum Computers and Reversible Logic.
PUBLICATION




Mohammad Hossein Samavatian, Mohammad Arjomand, Ramin Bashizade and Hamid Sarbazi-azad.
“Architecting the Last-Level Cache for GPUs Using STT-RAM Technology,” ACM Transactions on
Design Automation of Electronic Systems(TODAES) In press, 2015.
Mohammad Hossein Samavatian, Hamed Abbasitabar, Mohammad Arjomand and Hamid Sarbaziazad, “An Efficient STT-RAM Last Level Cache Architecture For GPUs,” DAC 2014, San Francisco, CA,
USA. (ACM)
Mahboobeh Houshmand, Morteza Saheb Zamani, Mehdi Sedighi, Mohammad Hossein Samavatian,
“Automatic Translation of Quantum Circuits to One-Way Quantum Computation Patterns,” 2014,
QINP.
Mahboobeh Houshmand, Mohammad Hossein Samavatian, Morteza Saheb Zamani, Mehdi Sedighi, "Extracting
One-way Quantum Computation Patterns from Quantum Circuits," International Symposium on Computer
Architecture and Digital Systems (CADS), Iran, 2012. (IEEE)
PROJECTS

A Novel STT-RAM Architecture for Last Level Shared Caches in GPUs (M.Sc. Thesis), 2012-2013.
o Supervisor: Prof. Hamid Sarbazi-Azad.
Due to the high processing capacity of GPGPUs and their requirement to a large and high speed shared memory
between thread processors clusters, exploiting Spin-Transfer Torque (STT) RAM as a replacement with SRAM can
result in significant reduction in power consumption and linear enhancement of memory capacity in GPGPUs. In the
GPGPU (as a many-core) with ability of parallel thread executing, advantages of STT-RAM technology, such as low
read latency and high density, could be so effective. However, the usage of STT-RAM will be grantee applications run
time reduction and growth threads throughput, when write operations manages and schedules to have least overhead
on read operations. The purpose of this thesis is propose and evaluate a STT-RAM architecture for last level cache
(LLC) in GPGPUs which uses circuit and architectural level techniques for managing access operation to LLC. First
by reducing retention time of STT-RAM cells hybrid architecture introduced, then with characterization of GPGPU
workloads, cache parameters such as cache micro-architecture, data retention time, latency and energy consumption
were calculated. Finally by simulating target architecture with different design explores latency and power consumption
of cache was measured. Proposed architecture result in Performance gain 16% in average and 100% maximum with
20% power saving. On the other hand with techniques used for data searching in cache, power consumption reduced
40% with performance improvement degradation from 16% to 15%.

ADVANCED VLSI course project, spring 2012:
o A complete design, synthesis and simulation procedure was done in this project.







RTL Simulated by Mentor Model-Sim.
Processor Synthesized by Synopsys Design Compiler.
Post-synthesis Simulation: Simulated synthesized processor by Model-Sim again with netlist file
and with/without SDF file.
Check/Examination similarity of synthesized processor with RTL description by
“FORMALITY” tool.
Placement and Routing done by Cadence SoC Encounter.
Post layout simulation by Synopsys HSIM.
RECONFIGURABLE COMPUTING course project, spring 2012:
Performance-Aware Clustering Algorithm with Simulated Annealing (PASACA): In this project we
introduce a novel method named Performance-Aware Simulated Annealing Clustering Algorithm (PASACA) to
cluster LEs for FPGA circuits that use simulated annealing algorithm with a suggested cost function in order to gain
circuit performance. PASACA reads blif files containing LEs and primary inputs and primary outputs information and
after running its clustering algorithm, and generates a net file for giving to VPR program. The clustering results on test
benches of PASACA are compared with TVpack.


RC Car Automatic Parallel Park design and implementation, spring 2011(More details in HERE).
COMPUTER ARCHITECTURE course project, spring 2009.
o Implementation of basic computer and micro-instruction computer architectures by Verilog and
Simulated by Model-Sim.




Logic circuit simulator with JAVA, spring 2008.
Simulation of Universal Asynchronous Receiver/Transmitter (UART) by Verilog, spring 2008.
Simulating Windows Command Prompt based on File System by C# language, fall 2008.
Derivation of Optimal One Way Quantum Computing (1WQC) Model Pattern from Quantum Circuit
Model. (B.Sc. Thesis) summer 2011.
o Supervisor: Prof. Morteza Saheb-Zamani.
Quantum computing is a new method of quantum information processing based on quantum mechanics. One of
quantum computing model is measurement based quantum computing (MBQC). MBQC is divided into two
categories: TQC (Teleportation quantum computing) and 1WQC (One way quantum computing). MBQC model has
no equivalent in the classical world and is based on two features of the quantum world, measurement and
entanglement. This project is focused on the 1WQC model and its goal is implement a program to convert quantum
circuit to 1WQC model. Computation in 1WQC model includes four main command. These commands are
preparation, entanglement, measurement and correction. In current project 1WQC model would be derived from a
quantum circuit that construct from CNOT,CZ, J(α), C2NOT and all of one qbit unitary gates. Input of the program
is a quantum circuit in QASM format. 1WQC model will be implemented as graph. J(α), CZ and CNOT gates add to
1WQC graph directly. C2NOT and one qbit unitary gates would be implemented and add to graph as combination of
three above gates. X and Z gates that are in 1WQC model as correction command will be added to graph but use
some methods till don’t create new axillary qbits. Other one or more input qbit gates would be constructed with
default and defined gates. Output of program is 1WQC model in CME standard form with some optimization like
Pauli simplification and signal shifting that apply on output model. Finally with some test bench evaluate correctness
of program. Time consumed and depth of quantum circuit before and after creation of 1WQC model are analyzed.
This program written by C++ programming language with Microsoft Visual Studio tool.
WORK And TEACHING EXPERIENCES

Research Assistant, Prof. Hamid Sarbazi-Azad, Institute for Research in Fundamental Sciences(IPM), Winter 2014 to
present

Teaching assistant, microprocessor by Dr. Ghasem Miremadi, Sharif Univeristy of Technology, Winter 2013

Microcontroller lab instructor, Amirkabir University of Technology
Winter 2012, Winter and Fall 2014

Quantum lab member, Amirkabir University of Technology
Research Assistant, Summer 2011, http://ceit.aut.ac.ir/QDA/members.htm

Novin Rayaneh Hamedan
Internship, Summer 2010
 Omid Technologies
Microcontroller Developer, Fall and winter 2012, http://www.omid.ca/
TECHINICAL EXPERTISE
 Programming Languages: C, C++ (expert), C#, Java (familiar)
 HDL Description Languages: VHDL (expert),Verilog (familiar)







Operating Systems: Fedora, Ubuntu, CentOS (expert), SUSE (familiar)
PCB CAD tools: Altium designer DXP (expert)
EDA tools: Cadence SoC Encounter, Cadence Virtuoso, Synopsys Design Compiler, Synopsys HSIM (familiar)
FPGA and CPLD: VPR, TVpack, ABC and FPGA programming (familiar)
Microcontrollers: AVR programming(ATmega 8, 16, 32, 64, 128, 2560/1), CodevisionAVR, IAR (AVR &
ARM) (expert)
Simulation Tools: GPGPU-Sim, Nvsim, CACTI, Hspice, Pspice, Modelsim (expert)
Gem5, Proteus (familiar)
Honor and awards

Rank 15th in PHD entrance exam, 2014.

Rank 33rd in MSc entrance exam, 2012.

Rank 546th in BSc entrance exam, 2008.

Accepted in the first round of chemistry Olympiad in highschool, 2006.
LANGUAGE PROFICIENCY

English: (Fluent), Persian: (Native)




Playing Volleyball and Ping-Pong, Swimming, Mountain climbing
Skiing
Cinema and Filmmaking
Music and Photography
Hobbies
REFERENCES (Available upon request)