Nautical Crime Investigation Services - Problem 1

Overview

NCIS is developing software to identify, track, and build profiles for vessels involved in incidents at sea. This software intends to use a large language model to collect and analyze publicly available web data. With a predefined list of incident categories, the algorithm will gather relevant information about vessels and their reported incidents. It will filter this data to focus solely on incident reports, extracting key details such as vessel name, flag state, and company for any vessel involved in the report. This information will be used to create comprehensive profiles of the reported vessels.

Proposed Problem

This project aims to leverage natural language processing techniques to generate search engine prompts that yield high-quality results. Quality indicators include relevance, diversity, soundness, and minimizing bias.

Note that this problem is independent of the other problem statements submitted by NCIS.

Context on web -scraping

With our current algorithm, users can specify how many of the top n search results should be scraped when a prompt is entered into the search engine. Any site within these top n results will be scraped, meaning there are no predetermined sites to include or exclude. However, as part of optimizing the prompt generation system, participants may want to compare the results for different values of n and identify any sources that are irrelevant and should not be scraped.

Assigned topic: underreporting/misreporting of catch

Baseline Problem Statement

Develop a system to generate diverse and relevant search prompts to web scrape publicly available incident reports on the given topic.

Extended Statement (time permitting)

Developing a dynamic prompt generation system that optimizes search engine prompts to web scrape publicly available incident reports on the given topic.

Skills

Required
  • NLP techniques and tools for text analysis
  • Familiarity with large language models
Preferrable
  • Proficiency in web scraping techniques and tools
Sogol Ghattan
Sogol Ghattan
Director, Responsible Development of Emerging Technologies, NCIS
Jack Kendrick
Jack Kendrick
PhD Student
Golnoush Farzanfard
Golnoush Farzanfard
Graduate student
Youssef Mousaaid
Youssef Mousaaid
Postdoctoral Researcher