Choose the Right Tool

Text Analysis

Methods

• Topic Modeling
• Information Retrieval
• Text Classification
• Sentiment Analysis
• Word Frequency Analysis

• Named Entity Recognition
• Collocation
• Word Embeddings
• Transformer Models
• Concordancing

Tools

  • Voyant Tools – web-based reading and analysis environment for digital texts
  • Mallet – Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text
  • WordSeer 4 – text analysis environment that combines visualization, information retrieval, sensemaking, and natural language processing
  • Orange Text Mining – open-source machine learning and data visualization for novices and experts
  • Antconc – freeware corpus analysis toolkit for concordancing and text analysis
  • Lexos – text analysis tool offering both web-app and local installation options
  • Constellate – create corpora from JSTOR’s collections with a built-in Python analysis platform
  • HathiTrust Research Center Analytics – supports large-scale computational analysis of the HathiTrust Digital Library
  • HTRC Algorithms – tools for assembling and analyzing HathiTrust corpus collections (includes copyrighted items)
  • Extracted Features Dataset – dataset for non-consumptive analysis of HathiTrust corpus features
  • HathiTrust + Bookworm – visualize and analyze word usage trends in the HathiTrust corpus
  • HTRC Data Capsule – secure computing environment for text analysis on HathiTrust corpus
  • Google Books Ngram Viewer – graph usage of terms/phrases over time
  • TAPoR 3 – sophisticated text analysis and retrieval tools

Visual Analysis

Tools

  • IIIF – standardize delivery of images and audio/visual files with interactive annotation capabilities
  • Loris – IIIF image server written in Python
  • Cantaloupe – open-source dynamic image server written in Java
  • Mirador 3 – open-source, multi-window image viewing platform with zoom, compare, and annotation features
  • Universal Viewer – a community-developed tool for viewing various file types
  • CatchPy – annotation server for IIIF image assets
  • Tropy – organize and annotate photos of archival resources
  • CVAT – open-source tool for image and video annotation

Data Visualizations

Tools

  • Tableau – query databases and spreadsheets to generate graph-type data visualizations
  • Flourish – create stunning charts, maps, and interactive content with no coding required
  • Datawrapper – create interactive, responsive & beautiful data visualizations
  • Google Looker Studio – convert data into customizable reports and dashboards
  • Infogr.am – free basic account with optional fee-based infographic service
  • Piktochart – convenient infographic editor
  • Canva – experiment with data visualization using hundreds of free design elements
  • Easel.ly – thousands of free infographic templates and design objects
  • D3.js – JavaScript library for bespoke data visualization

Digital Annotation

Tools

  • Hypothes.is – open annotation software running through a Chrome browser extension
  • Tropy – organize and annotate photos of archival resources
  • Recognito – annotate documents and photographs with a simple web-app
  • Annotation Studio – collaborative web-based annotation tools from MIT HyperStudio
  • Neatline – add-on tools for Omeka with image and map annotation capabilities
  • Scalar – scholarly publishing software with built-in annotation tools for multiple media types

Spatial Analysis and Web Mapping

Tools

  • QGIS – a free and open-source desktop geographic information system application
  • CARTO – SaaS spatial analysis platform with GIS, web mapping, and data visualization features
  • Esri ArcGIS Online – create and share ArcGIS maps online
  • Neatline – tell stories with maps and timelines using this Omeka add-on
  • Google Maps API – create real-world experiences with Maps, Routes, and Places features
  • Open Layers – put dynamic maps in any web page
  • Mapbox – build customizable maps for web, mobile, automotive, and AR
  • Story Maps – combine authoritative maps with narrative text, images, and multimedia
  • Palladio – visualize complex historical data with ease
  • Clio – educational app using GPS to connect users to the surrounding history
  • Leaflet – JavaScript library for interactive maps
  • Tilegrams – create tiled cartograms online
  • MapAlList – create customized Google maps from address lists

Network Analysis

Tools

  • Gephi – visualization and exploration software for graphs and networks
  • Net.Create – an open-source tool for simultaneous multi-user network data entry
  • Palladio – visualize complex historical data with ease
  • Cytoscape – an open-source platform for visualizing complex networks
  • NodeXL – Microsoft Excel plugin for network visualization and analysis
  • NetworkX – Python package for network creation, manipulation, and analysis
  • Igraph – network analysis tools with emphasis on efficiency and portability
  • VizNetwork – R package for network visualization using vis.js library
  • D3.js – JavaScript library for bespoke data visualization

Timeline and Temporal Analysis

Tools

  • TimelineJS – open-source tool for building visually rich, interactive timelines
  • Chronos Timeline – render interactive timelines in Obsidian notes from simple Markdown
  • Neatline – tell stories with maps and timelines as Omeka add-on tools
  • TimeGlider – web-based timeline builder
  • TimeToast – create timelines to add to websites or blogs
  • Viewshare – a free platform for generating interactive maps and timelines

Machine Learning

Tools


Database Development

Tools

  • FileMaker Pro – cross-platform relational database application
  • PostgreSQL – a powerful, open-source object-relational database system
  • MySQL – open-source relational database management system
  • MongoDB – cross-platform, document-oriented database program
  • Elasticsearch – distributed, RESTful search and analytics engine
  • Solr – open source, multi-modal search platform built on Apache Lucene
  • AWS DynamoDB – serverless, NoSQL database service
  • Neo4J – graph database management system
  • Datagrip – a cross-platform tool for relational and NoSQL databases
  • Postico – native Mac app for PostgreSQL
  • SQL Server Management Studio – configure, manage, and administer Microsoft SQL Server components
  • Corpora – database, REST API, and data collection interface in one

Data Cleaning

Tools

  • OpenRefine – an open-source desktop application for data cleanup and transformation
  • Tidyverse – tidyr provides functions for getting to tidy data with a consistent form
  • Pandas – open open-source data analysis and manipulation tool built on Python

Project Management

Tools

  • Trello – web-based, kanban-style, list-making application
  • Github Projects – an adaptable spreadsheet and task board that integrates with GitHub issues
  • Asana – web and mobile work management platform
  • Monday.com – adaptable project management software
  • Airtable – variety of project management templates

Citation Management

Tools

  • Zotero – free tool to collect, organize, cite, and share research
  • EndNote – commercial reference management software for managing bibliographies
  • Mendeley – reference management software

Digital Collections

Tools

  • Omeka – free, flexible, open source web-publishing platform for libraries, museums, and scholarly collections
  • Scalar – free, open source authoring and publishing platform for born-digital scholarship
  • Story Maps – combine authoritative maps with narrative text, images, and multimedia
  • Mukurtu – a content management system for sharing information in culturally relevant ways
  • Neatline – tell stories with maps and timelines using Omeka add-on tools

Digital Publishing

Tools


Web Development

Tools

  • Drupal – free, open-source content management system
  • WordPress – web content management system
  • Google Sites – free, easy-to-use website builder
  • GitHub Pages

Data Curation and Management

Tools

  • Git – distributed version control system for tracking file versions
  • Dataverse – open source research data repository software
  • Github – developer platform for creating, storing, managing, and sharing code

Programming Languages and Packages

Tools

  • Jupyter Notebooks – free software, open standards, and web services for interactive computing across all programming languages

Python

  • Natural Language Toolkit (NLTK) – leading platform for building Python programs to work with human language data
  • SpaCy – free, open-source library for advanced Natural Language Processing in Python
  • Gensim – free open-source Python library for representing documents as semantic vectors
  • Matplotlib – comprehensive library for creating static, animated, and interactive visualizations in Python
  • Seaborn – Python data visualization library based on matplotlib
  • Plotly – open source graphing library

R

  • Quanteda – R package for managing and analyzing text
  • Tidytext – text mining with R
  • SpaCyR – provides a convenient R wrapper around the Python spaCy package
  • Ggplot2 – system for declaratively creating graphics based on The Grammar of Graphics
  • Plotly – open source graphing library

Coding


Transcription

Tools

  • Abby Finereader
  • Scripto – open-source tool for viewing and transcribing digital files
  • OTranscribe – free web-based audio transcription interface
  • eScriptorium – digital text production pipeline for print and handwritten texts using machine learning
  • Transkribus – AI platform for automatically recognizing text, layout, and structure in historical documents
  • Amazon Textract – machine learning service that automatically extracts text and data from scanned documents
  • Google Document AI – create document processors that automate tasks and improve data extraction