Hiroyuki Kitagawa, Professor
Hiroyuki Kitagawa is a full professor at Center for Computational Sciences, University of Tsukuba. He received the Dr.Sc. degree in computer science from the University of Tokyo in 1987. His research interests include databases, data integration, data mining, and information retrieval. He is a fellow of IPSJ and IEICE, and an Associate Member of the Science Council of Japan.
Big Data management and utilization are very important issues in various computational science fields. The Database Group is working on a wide range of Big Data analysis, management, and application problems from a data engineering perspective. Specifically, we are focusing on the following research topics: data processing frameworks for integrating heterogeneous and real-time data, scalable data analysis techniques for large-scale data, knowledge acquisition techniques for large-scale social and scientific data, and open data management techniques. Besides conducting basic research on algorithms and systems for Big Data, we are also collaborating with Divisions of Global Environment Science, Particle Physics, and Life Science in the Center for Computational Sciences, and other research institutes such as International Institute for Integrative Sleep Medicine for further contribution to a wider range of computational science applications.
Fig. 1 Frequent Pattern Mining using GPU
Big Data Processing Frameworks
We are developing the following data processing systems and algorithms for Big Data management and utilization addressing the 3V properties of Big Data (Volume, Variety, and Velocity): (1) data processing frameworks for integrating and processing stream data obtained from sensors, social streams and other data sources, (2) scalable Big Data analysis techniques using massively parallel processing hardware such as GPUs, Intel Xeon Phi, and multi-core CPUs, (3) privacy-preserving and secure data processing schemes for Big Data management, and (4) data processing frameworks for open data including RDF and LOD. Specifically, we have developed a stream processing framework JsSpinner and several query optimization techniques. Furthermore, we have built an OLAP system for streams, named StreamingCube, based on JsSpinner to support interactive analysis of massive stream data from multi-dimensional perspectives. These systems are utilized in Big Data use case studies in several research projects.
Data Mining Algorithms
We have developed various data mining and knowledge discovery techniques including (1) data analysis and mining algorithms for documents, images, and graphs, (2) social data analysis and mining methods, and (3) machine learning algorithms for biological and medical data analysis. In addition, we have proposed several efficient methods for analyzing large-scale graphs and social networks using general purpose computing on graphical processing unit (GPGPU) and many-core processors. The experimental results prove that our methods outperform the state-of-the-art algorithms running on a CPU by a factor of several orders of magnitude.
Scientific Data Management
To deal with rapidly increasing big scientific data, research has been conducted on the following topics: (1) development and operation of meteorological databases, such as the GPV/JMA Archive and the Japanese 25-(55-)year ReAnalysis (JRA-25/55) Archives (Figure 2), (2) development and operation of the Japan/International Lattice Data Grids (JLDG/ILDG), which enable researchers in particle physics to share and exchange lattice QCD gauge configurations, and (3) machine learning algorithms for effectively analyzing biological and medical data. Specifically, we have developed novel sleep stage analysis algorithms for mice and humans in collaboration with International Institute for Integrative Sleep Medicine in University of Tsukuba. One algorithm automatically classifies states of mice into the REM sleep, Non-REM sleep, and wake stages with accuracy of more than 95%.
(Update: Dec. 18, 2019)