Climate change is no longer just a prediction of the future, but a reality that we feel every day. Starting from the increasingly scorching air temperatures, erratic flash floods, to shifts in the planting season that confuse our farmers. However, amid the urgent need for real action, there is another major challenge lurking in the digital world: hoaxes and climate change misinformation.
In Indonesia, conversations around environmental issues are often obscured by false information. Some say that global warming is just a conspiracy, while others spread false claims about the causes of natural disasters. The impact is fatal. The public became skeptical, hesitant to act, and environmental rescue policies became difficult to receive full support.
Unfortunately, hoax makers are very smart. They not only spread fake news in standard Indonesian but also entered through regional languages that were closer to the hearts of the people. This is where the big problem lies: we lack digital tools or linguistic resources to detect hoaxes in regional languages. As a result, conventional hoax filtering systems often “escape” when reading texts outside the official Indonesian language.
Getting to Know NusaClimate: A New Weapon Against Hoaxes
Seeing this critical gap, a group of researchers in Indonesia AI Institute (IAII) is doing research to develop an innovative solution called NusaClimate.
NusaClimate is the first multilingual giant data collection (corpus) that was deliberately created to detect people’s attitudes or stances on the issue of climate change. This dataset collects 50,613 text data covering four languages at once:
- English
- Minangkabau Language
- Balinese
- Bugis Language
The presence of three regional languages (Minangkabau, Balinese, and Bugis) is very important because all three are classified as languages with minimal digital resources (low-resource languages). With NusaClimate, artificial intelligence (AI) now has an adequate “dictionary” to understand the local context in depth.
How Technology & Experiment Works
How is that much data processed into a hoax extermination system? The answer lies in a technology called the Encoder-based Language Model.
Think of this system like a highly sensitive language detective. When there is a new claim circulating on social media, the AI will perform a semantic comparison (word meaning). The system will match these claims to the premise (scientific facts or valid data in the NusaClimate dataset), even though they are written in different regional languages (cross-lingual).
Through this framework, the IAII researchers are in the progress of building a real-time climate misinformation checker tool that can be used by the wider community to directly filter which news is valid and which is a hoax right away.
Why Should AI Be “Trained” Again?
To ensure this AI works intelligently, the IAII researchers performed a process called Fine-Tuning. Why is this important?
The AI models are basically good at reading language in general, but they need to be “trained” specifically in order to understand environmental scientific terms and local slang related to climate. In this experiment, the researchers tested three popular giant language models:
- IndoBERT (from IndoNLU) – Very good at understanding the structure of the formal Indonesian language.
- IndoBERT-Nusa – An improved version to understand the language variations in the archipelago.
- XLM-RoBERTa Large – A powerful international multilingual model in bridging different interlingual meanings.
The experiment was conducted through the supervised finetuning method, in which the AI is trained to use optimal hyperparameters for two main tasks: detecting the attitudes of the text to climate misinformation and grouping topics and subtopics around the environmental issue.
~This research is in progress.