Service Project

WAT-SMS: Summarizing Wikipedia Pages into SMS

avril 12, 2024 260 words 59 views

JEAN LOUIS FENDJI KEDIENG EBONGUE

Administrator

Text summarization remains a complex task in the field of Natural Language Processing, despite the plethora of applications in business and daily life. One common use case is web page summarization, which offers the possibility of providing overviews of these pages to devices with limited capabilities. Indeed, despite the increasing penetration rate of mobile devices in rural areas, most of these devices offer limited features, coupled with the fact that these areas are covered by restricted connectivity, such as the GSM network. Summarizing web pages into SMS format, therefore, becomes an important task for delivering information to limited devices.

This work presents WATS-SMS, an abstractive French Wikipedia text summarization system for SMS based on T5. It is built using a **transfer learning** approach. The pre-trained English T5 model is used to generate a French text summarization model by retraining it on 25,000 Wikipedia pages and is subsequently compared to various approaches in the literature.

The objective is twofold:
1. To verify the hypothesis formulated in the literature that abstractive models provide better results compared to extractive models.
2. To evaluate the performance of our model against other existing abstractive models.

A score based on ROUGE metrics yielded a value of 52% for articles up to 500 characters in length, compared to 34.2% for transformer-ED and 12.7% for seq2seq-attention; and a value of 77% for larger articles, compared to 37% for transformers-DMCA. Furthermore, an architecture including a software SMS gateway was developed to allow owners of mobile devices with limited features to send queries and receive summaries via the GSM network.

Share this article

Want to contribute?

Join Living Seeds Lab and share your research and insights with our community.

Contact Us to Contribute