Nautical Crime Investigation Services - Problem 1
Overview
NCIS is developing software to identify, track, and build profiles for vessels involved in incidents at sea. This software intends to use a large language model to collect and analyze publicly available web data. With a predefined list of incident categories, the algorithm will gather relevant information about vessels and their reported incidents. It will filter this data to focus solely on incident reports, extracting key details such as vessel name, flag state, and company for any vessel involved in the report. This information will be used to create comprehensive profiles of the reported vessels.
Proposed Problem
This project aims to leverage natural language processing techniques to generate search engine prompts that yield high-quality results. Quality indicators include relevance, diversity, soundness, and minimizing bias.
Note that this problem is independent of the other problem statements submitted by NCIS.
Context on web -scraping
With our current algorithm, users can specify how many of the top n search results should be scraped when a prompt is entered into the search engine. Any site within these top n results will be scraped, meaning there are no predetermined sites to include or exclude. However, as part of optimizing the prompt generation system, participants may want to compare the results for different values of n and identify any sources that are irrelevant and should not be scraped.
Assigned topic: underreporting/misreporting of catch
Baseline Problem Statement
Develop a system to generate diverse and relevant search prompts to web scrape publicly available incident reports on the given topic.
Extended Statement (time permitting)
Developing a dynamic prompt generation system that optimizes search engine prompts to web scrape publicly available incident reports on the given topic.
Skills
Required
- NLP techniques and tools for text analysis
- Familiarity with large language models
Preferrable
- Proficiency in web scraping techniques and tools