Skip to main content

AI Media and Misinformation Content Analysis Tool

Project description

AMMICO - AI Media and Misinformation Content Analysis Tool

License: MIT GitHub Workflow Status codecov Quality Gate Status Language

This package extracts data from images such as social media posts that contain an image part and a text part. The analysis can generate a very large number of features, depending on the user input.

This project is currently under development!

Use pre-processed image files such as social media posts with comments and process to collect information:

  1. Text extraction from the images
    1. Language detection
    2. Translation into English or other languages
    3. Cleaning of the text, spell-check
    4. Sentiment analysis
    5. Subjectivity analysis
    6. Named entity recognition
    7. Topic analysis
  2. Content extraction from the images
    1. Textual summary of the image content ("image caption") that can be analyzed further using the above tools
    2. Feature extraction from the images: User inputs query and images are matched to that query (both text and image query)
    3. Question answering
  3. Performing person and face recognition in images
    1. Face mask detection
    2. Age, gender and race detection
    3. Emotion recognition
  4. Object detection in images
    1. Detection of position and number of objects in the image; currently person, bicycle, car, motorcycle, airplane, bus, train, truck, boat, traffic light, cell phone
  5. Cropping images to remove comments from posts

Installation

The AMMICO package can be installed using pip:

pip install git+https://github.com/ssciwr/ammico.git

This will install the package and its dependencies locally.

Usage

There are sample notebooks in the notebooks folder for you to explore the package:

  1. Text extraction: Use the notebook get-text-from-image.ipynb to extract any text from the images. The text is directly translated into English. If the text should be further analysed, set the keyword analyse_text to True as demonstrated in the notebook.
    You can run this notebook on google colab: Here
    Place the data files and google cloud vision API key in your google drive to access the data.
  2. Emotion recognition: Use the notebook facial_expressions.ipynb to identify if there are faces on the image, if they are wearing masks, and if they are not wearing masks also the race, gender and dominant emotion. You can run this notebook on google colab: Here
    Place the data files in your google drive to access the data.
  3. Content extraction: Use the notebook image_summary.ipynb to create captions for the images and ask questions about the image content. You can run this notebook on google colab: Here
  4. Multimodal content: Use the notebook multimodal_search.ipynb to find the best fitting images to an image or text query. You can run this notebook on google colab: Here
  5. Object analysis: Use the notebook ojects_expression.ipynb to identify certain objects in the image. Currently, the following objects are being identified: person, bicycle, car, motorcycle, airplane, bus, train, truck, boat, traffic light, cell phone. You can run this notebook on google colab: Here

There are further notebooks that are currently of exploratory nature (colors_expression.ipynb to identify certain colors on the image). To crop social media posts use the cropposts.ipynb notebook.

Features

Text extraction

The text is extracted from the images using google-cloud-vision. For this, you need an API key. Set up your google account following the instructions on the google Vision AI website. You then need to export the location of the API key as an environment variable:

export GOOGLE_APPLICATION_CREDENTIALS="location of your .json"

The extracted text is then stored under the text key (column when exporting a csv).

Googletrans is used to recognize the language automatically and translate into English. The text language and translated text is then stored under the text_language and text_english key (column when exporting a csv).

If you further want to analyse the text, you have to set the analyse_text keyword to True. In doing so, the text is then processed using spacy (tokenized, part-of-speech, lemma, ...). The English text is cleaned from numbers and unrecognized words (text_clean), spelling of the English text is corrected (text_english_correct), and further sentiment and subjectivity analysis are carried out (polarity, subjectivity). The latter two steps are carried out using TextBlob. For more information on the sentiment analysis using TextBlob see here.

The Hugging Face transformers library is used to perform another sentiment analysis, a text summary, and named entity recognition, using the transformers pipeline.

Content extraction

The image content ("caption") is extracted using the LAVIS library. This library enables vision intelligence extraction using several state-of-the-art models, depending on the task. Further, it allows feature extraction from the images, where users can input textual and image queries, and the images in the database are matched to that query (multimodal search). Another option is question answering, where the user inputs a text question and the library finds the images that match the query.

Emotion recognition

Emotion recognition is carried out using the deepface and retinaface libraries. These libraries detect the presence of faces, and their age, gender, emotion and race based on several state-of-the-art models. It is also detected if the person is wearing a face mask - if they are, then no further detection is carried out as the mask prevents an accurate prediction.

Object detection

Object detection is carried out using cvlib and the YOLOv4 model. This library detects faces, people, and several inanimate objects; we currently have restricted the output to person, bicycle, car, motorcycle, airplane, bus, train, truck, boat, traffic light, cell phone.

Cropping of posts

Social media posts can automatically be cropped to remove further comments on the page and restrict the textual content to the first comment only.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ammico-0.0.1.tar.gz (28.9 kB view details)

Uploaded Source

Built Distribution

ammico-0.0.1-py3-none-any.whl (29.3 kB view details)

Uploaded Python 3

File details

Details for the file ammico-0.0.1.tar.gz.

File metadata

  • Download URL: ammico-0.0.1.tar.gz
  • Upload date:
  • Size: 28.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for ammico-0.0.1.tar.gz
Algorithm Hash digest
SHA256 c9a3233360a525693b5a0b49fd6ca1d6b4db8f1d3ac320850a6c5da6fcb145d2
MD5 7b15d12ecc7c681522d0c8be92092c00
BLAKE2b-256 d303cc3f4897e7c512fad24d4b418917c7236e1c88b1c4645aef9084e81d0084

See more details on using hashes here.

File details

Details for the file ammico-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: ammico-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 29.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for ammico-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e4fd254e5f9fd161159b025d703f395a4ee87fbac9315fa6a8b3625220deb68f
MD5 6ad3552ac51c1c23ac97fbf6ea550e8b
BLAKE2b-256 d7aad14b755a2bfae59d9b001b3def382e3fb986ef4a7623d43d6ecb67907893

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page