Download PDFOpen PDF in browser

Identifying Legality of Japanese Online Advertisements Using Complex-Valued Support Vector Machine with DFT-Based Document Features

EasyChair Preprint no. 6692

14 pagesDate: September 25, 2021

Abstract

As Internet advertising market expands, the number of advertisements containing inappropriate language is increasing. Advertisements that exaggerate the efficacy of products may contravene the Pharmaceutical Affairs Law and the Act against Unjustifiable Premiums and Misleading Representations. Therefore, a system that can detect problematic expressions is required. Some advertisements cannot be classified using only the statistics of words. Therefore, embedding other information, such as word order and word period in the features is effective to categorize documents. However, the number of labeled data in advertising documents is limited; consequently, models with complex structures tend to overlearn. In addition, features and discriminant models with high generalization performance must be found even if the number of data is small. To address these severe issues, we propose a document feature based on the discrete Fourier transform(DFT) of word vectors weighted using an index previously proposed in a study that attempted to categorize Chinese online advertisements. We also propose a document discriminant model based on a complex-valued support vector machine.

We demonstrate that the proposed model outperforms previous models in terms of discriminative performance of F-measure. We found that the proposed index emphasizes word vectors of specific nouns and verbs in Japanese advertisements. In addition, we found that DFT significantly increased the norms of document vectors of illegal documents. These factors contributed to the better performance of the proposed model.

Keyphrases: Complex-valued Support Vector Machine, Discrete Fourier Transform, Internet advertisement, Natural Language Processing

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
@Booklet{EasyChair:6692,
  author = {Satoshi Kawamoto and Toshio Akimitsu and Kikuo Asai},
  title = {Identifying Legality of Japanese Online Advertisements Using Complex-Valued Support Vector Machine with DFT-Based Document Features},
  howpublished = {EasyChair Preprint no. 6692},

  year = {EasyChair, 2021}}
Download PDFOpen PDF in browser