Cameras’ biggest recent advancements have come from AI, not sensors and lenses. Over the past couple of years, technology has enabled staggering advances in photography. AI is transforming both the way we shoot photos and how we edit them.
As ‘computer vision’ becomes an important part of other new technologies such as self-driving cars, AI is going to become increasingly sophisticated when it comes to interpreting and understanding the contents of our images.
Being a passionate photographer, I always strive to automate manual tasks so that I can focus on creating creative content. In this project, I will discuss how you can create and customize word cloud using your own images in a few simple steps.
Objective: Turning photographs into typography art using word cloud in python.
A Picture is worth a thousand words. Literally! there are 2200+ words in this picture. 😱
Typography: It is the art of arranging text in a visually appealing way. It aims to elicit certain emotions and convey specific messages.
At one point in time, people were literally taking letters and characters and arranging them in physical space. In this project, I will show how we can leverage the power of word cloud in python to make this art form scalable and create it in a matter of a few mins.
Word cloud: Word Cloud is a data visualization technique used for representing text data in which the size of each word indicates its frequency or importance. Word clouds are widely used for analyzing data from social network websites for sentiment analysis.
For generating word cloud in Python, modules needed are — matplotlib, OpenCV, and wordcloud.
Here are the steps involved:
1. Relevant data collection (Web scraping)
2. Data Cleaning & Natural Language Processing (NLP)
3. Creating masks from the image and generating word clouds
- Relevant Data Collection:
For this project, in order to get the list of most popular words in the photography domain, I scrapped the 836 photography course titles (e.g. Advanced Portrait Editing Techniques) from a popular photography website KelbyOne. I scrapped the data from 70 pages using python module Scrapy. These courses were uploaded from 2006 onwards.
2. Text Pre-processing:
I have used python module ‘Spacy’ to perform Natural Language Processing (NLP)
It is the process of splitting a string into its constituent tokens. These tokens may be words or punctuations.
Course Title: “What The Flash? Controlling Your Light”
Tokens: [“What”, “The”, “Flash”, ”?”, “Controlling”, “Your”, “Light”]
Convert word into its base form:
E.g. words like reducing, reduces, reduced, reduction will be converted to reduce.
- Text Cleaning techniques:
Removing Unnecessary whitespaces, punctuations, special characters (numbers, emojis, etc.) and Non-alphabetic tokens (e.g. D750)
Words that occur extremely commonly and do not add much meaning to a sentence.
E.g. Articles (a, the, etc.), be verbs (is, am, etc.), pronouns (he, she, etc.)
After processing, we have a total of 3558 words in the data and 1133 unique words, all of the words in the data are used in creating the word cloud.
In a word cloud, the words which are most frequent are more prominent (greater the frequency larger is the font size).
3. Creating masks from the image and generating word clouds
I created 2 masks of the image in photoshop and applied word cloud separately in each of them. The text fills the black portion of the mask. Each mask is filled with all the 1100+ unique words.
Mask 1 word cloud:
I kept the background black and text in white for mask 1 to highlight the subject and the Arch of the monument.
The text string of all the 3558 words (out of which 1133 words are unique) is passed in the wordcloud function.
import cv2 from wordcloud import WordCloud import matplotlib.pyplot as plt#White text Black Background image = cv2.imread("D:/Photography/Typography/mask1.jpg", 1) wordcloud = WordCloud(background_color='black', mask=image, mode="RGB", color_func=lambda *args, **kwargs: "white", width=1000 , max_words=100, height=1000, random_state=1).generate(text)fig = plt.figure(figsize=(25,25)) plt.imshow(wordcloud, interpolation='bilinear') plt.tight_layout(pad=0) plt.axis("off") plt.show()
I created multiple word clouds and saved the best one. Changing the value of the argument ‘random_state’ generates different outputs.
You can also customize the number of words in the word cloud by changing the value of the argument ‘max_words’.
The argument ‘interpolation=bilinear’ is used to make the image appear smoother.
Mask 2 word cloud:
I kept the background white and text in colors for mask 2 to add detail and interest in the typography.
#Colored text white Background image = cv2.imread("D:/Photography/Typography/mask2.jpg", 1) wordcloud = WordCloud(background_color='white', mask=image, mode="RGB", max_words=1200, width=1000 , height=1000, random_state=2).generate(text)fig = plt.figure(figsize=(25,25)) plt.imshow(wordcloud, interpolation='bilinear') plt.tight_layout(pad=0) plt.axis("off") plt.show()
I combined each word cloud with the mask to get the following result:
Final result after merging them in photoshop.
The word cloud can be generated at any resolution, making it very relevant to be printed on a large dimension. I’m exploring how I can insert words in order from poetry or a story so that this art will be more meaningful. In my subsequent blog in the series, I will be talking about Artistic Style Transfer, 3D Image Inpainting, and a lot more.
I applied this technique in my photography work and the results are amazing!
Thank you for reading! I hope you enjoyed the article. If you want to keep up to date with my articles please follow me.
I have shared the images (I reserve the rights of all the images used in this article, they are shot by me) and masks so that you can experiment with this yourself.
About the author, Apratim Sahu:
Growth Hacker, B.Tech M.Tech IIT Kharagpur, Photographer
Passionate about Computer Vision and AI
The Open Data Science community is passionate and diverse, and we always welcome contributions from data science professionals! All of the articles under this profile are from our community, with individual authors mentioned in the text itself.
Need More Efficient ETL? Learn More with the Virtual HPCC Systems Community Day Data Lake Track, October 5th
- How Bayesian Machine Learning Works 207 views | by ODSC Community | under Machine Learning, Modeling
- The Ultimate Free Machine Learning Development Stack 189 views | by Nick Acosta | under Machine Learning, Modeling
- Getting Started with Pandas 155 views | by ODSC Community | under Modeling, Python, Tools & Languages