Shuichiro Shimizu

I am a first year Ph.D student at Kurohashi-Chu-Murawaki lab, Kyoto University.
My research interest centers on natural language processing. I'm currently work on information extraction and large language models.

Education

Ph.D student, Informatics (Intelligence Science and Technology), Kyoto University Apr. 2023 - Present
M.S. student, Informatics (Intelligence Science and Technology), Kyoto University Apr. 2021 - Mar. 2023
B.E., Energy Engineering, Zhejiang University Sep. 2014 - Sep. 2018

Experiences

  • Internship at SenseTime Japan

    Mar. 2022 - Mar. 2023

    NEC corporation is a company which provides software/networking services to companies and government agencies. I worked on mutilingual automatic speech recognition at the biometrics research lab.

  • Trainee at NICT

    Nov. 2020 - Present

    NICT (National Institute of Information and Communications Technology) is a Japan's national research agency specializing in the field of information and communications technology. I am doing my research as a trainee at the speech processing lab (ASTREC) with Dr. Sheng Li.

  • Developer at Kangi

    May 2018 - Present

    Kangi is a land survey company. I maintain almost all the programs used there, such as the company website, a program to read data from transits, and a program to display the read data. I developed a system for structure relocation with my colleagues.

  • Office assistant at Oki lab

    Jan. 2019 - Mar. 2020

    Oki lab (Department of Communications and Computer Engineering, Graduate School of Informatics, Kyoto University) is a laboratory working on networking technologies. I helped develop a system to construct a relational network of people based on their brain activities. Please refer to the publication for more details.

Publications

  • Iglika Nikolova-Stoupak, Shuichiro Shimizu, Chenhui Chu, and Sadao Kurohashi. Filtering of Noisy Web-Crawled Parallel Corpus: the Japanese-Bulgarian Language Pair . In Fifth International Conference on Computational Linguistics in Bulgaria 2022 (CLIB 2022), 2022.
  • Shuichiro Shimizu, Chenhui Chu, Sheng Li, and Sadao Kurohashi. Cross-Lingual Transfer Learning for End-to-End Speech Translation. In 自然言語処理 (Journal of Natural Language Processing), Vol.29, No.2, 2022.
  • Yihang Li, Shuichiro Shimizu, Chenhui Chu, and Sadao Kurohashi. 曖昧性を含む翻訳に着目したマルチモーダル機械翻訳データセットの構築方法の検討 (Towards the Construction of Multimodal Machine Translation Dataset Focusing on Ambiguity of Translations). In 言語処理学会 第28回年次大会 発表論文集 (Proceedings of the 28th Annual Meeting of the Association for Natural Language Processing), 2022.
  • Yihang Li, Shuichiro Shimizu, Weiqi Gu, Chenhui Chu, and Sadao Kurohashi. VISA: An Ambiguous Subtitles Dataset for Visual Scene-Aware Machine Translation. In Proceedings of the 13th International Conference on Language Resources and Evaluation (LREC 2022), 2022.
  • Shuichiro Shimizu, Chenhui Chu, Sheng Li, and Sadao Kurohashi. End-to-End Speech Translation with Cross-lingual Transfer Learning. In 言語処理学会 第27回年次大会 発表論文集 (Proceedings of the 27th Annual Meeting of the Association for Natural Language Processing), 2021.
  • (Contribution to work) Ryoichi Shinkuma, Satoshi Nishida, Masataka Kado, Naoya Maeda, and Shinji Nishimoto. Relational network of people constructed on the basis of similarity of brain activities. In IEEE Access, 2019.

Awards

  • Special Committee Award for the paper Towards the Construction of Multimodal Machine Translation Dataset Focusing on Ambiguity of Translations in the 28th Annual Meeting of the Association for Natural Language Processing.

Skills

Programming languages

  • Development on Linux
    • Shell scripts (bash), Ansible, Docker
  • Python
    • NumPy, PyTorch, etc.
  • HTML, CSS, JavaScript, TypeScript
    • React, Flask, FastAPI
  • TeX
    • LaTeX, Beamer, TikZ, CircuiTikZ
  • C, C++, Visual C++

Natural languages

  • Japanese (native)
  • English (CEFR C1; TOEFL iBT 102/120)
  • French (subjectively CEFR A2)
  • Chinese, Korean, Arabic (subjectively CEFR A1)
  • Latin, Spanish (pre-A1)

Activities