TextLab

Posted on  by 



TextLab is a web application that helps scholars, editors, and students analyze revisions of any written work, in manuscript or print, in order to create a fluid-text edition of that work. How Does it Work? TextLab’s XML editor automatically inserts TEI tags for text and image transcription markup. Spencer-Brown and His World of Distinction. Note Preview Review Media Work Life. TextLab is a life-saver!!! The colored formatting is very helpful - makes my own mistakes jump out at me.' —- Danny Hall, www.rainfroginc.com. 'I love how easy it is to go from XML to JSON.

  1. Textlab Apk
  2. Textlabel.size
  3. Textlab04 Java Assignment
Latest version

Released:

A Text Analytics Toolkit (TextAnalyticsLab/TextLab) for Python

Project description

Current release: TextLab [v0.1.5]

TextAnalyticsLab (TextLab) - a collection of Text Analytics tools for Python.

Introduction

'TextAnalyticsLab'/'TextLab' is a Python package providing a set of text analytics toolsfor data mining and machine learning projects and end-to-end text analyticsapplication development. It is compatible with and interoperate with dataanalysis and manipulation library Pandas, natural language processing librarynltk, Machine Lerning TookKit (pymltoolkit|mltk), and many other AI and machinelearning platforms.

Installation

TextLab

If the installation failed with dependancy issues, execute the above command with --no-dependencies

Functions

  • Text Similarity
  • OCR (A wrapper to convert image documents to text using Tesseract-OCR and Ghostscript)
  • Text Mining and Information Extraction (in v0.2.0)
  • Cleaning Text content
  • Web Scraping (in v0.1.6)
  • Email Data Extraction
  • Classification of Text Conent (in v0.2.0)

Usage

Warning: Python Variable, Function or Class names

The Python interpreter has a number of built-in functions. It is possible to overwrite thier definitions when coding without any rasing a warning from the Python interpriter. (https://docs.python.org/3/library/functions.html)Therfore, AVOID THESE NAMES as your variable, function or class names.

TextLab
absallanyasciibinboolbytearraybytes
callablechrclassmethodcompilecomplexdelattrdictdir
divmodenumerateevalexecfilterfloatformatfrozenset
getattrglobalshasattrhashhelphexidinput
intisinstanceissubclassiterlenlistlocalsmap
maxmemoryviewminnextobjectoctopenord
powprintpropertyrangereprreversedroundset
setattrslicesortedstaticmethodstrsumsupertuple
typevarszip__import__

If you accedently overwrite any of the built-in function (e.g. list), execute the following to bring built-in defition.

Text Analytics Example

Text Similarity

text1

text2

Output:

Processing Documents

OCR Test

PDF To Image

Process PDF file and store results in Pandas DataFrame

Convert PDF or Image file to text

Convert PDF to Image DataFrame

TextLab

Appy OCR on Images DataFrame

Email Data Extraction

EML file

Read from Exchange Web Services (using exchangelib)

License

Textlab Apk

Text Analytics Project Timeline

  • 2018-07-10 [v0.0.1]: Initial set of functions for text data analysis was published to Github. (https://github.com/sptennak/TextAnalytics).
  • 2019-01-03 [v0.0.2]: Created more functions for data exploration including web scraping and geo spacial data analysis for for IBM Coursera Data Science Capstone Project was published to Github. (https://github.com/sptennak/Coursera_Capstone).
  • 2019-07-20 [v0.1.2]: First release of the 'TextLab' Text Analytics Python package to PyPI.
  • 2019-11-10 [v0.1.3]: Enhancments and bug fixes. Integrated a wrapper to convert image documents to text using Tesseract-OCR and Ghostscript. This module was developed as a part of IBM Coursera Advanced Data Science Professional Certificate Capstone Project. (https://github.com/sptennak/IBM-Coursera-Advanced-Data-Science-Capstone) in the initial stage, but was not used in the final version due to text analytics was omitted in the final deliverable.
  • 2019-11-16 [v0.1.4]: Bug Fixes, Enhanced Document Processing functions. Integrated Document Server API with OCR function.
  • 2019-12-21 [v0.1.5]: Integrated email data extraction functions and cleaning text content.

Future Release Plan

  • TBD [v0.1.6]: Integreate Web scraping functions. Comprehensive documentation, Major bug-fix version of the initial release with some enhancements.
  • TBD [v0.1.6]: Enhance Information extraction functionality. Adding support to more opersource tools (OCR, Image Converters, etc.) avaiable.
  • TBD [v0.2.0]: Integrate Text Mining, Information Extraction, and Classification.
  • TBD [v0.3.0]: End-to-end Text Analytics Application Development

References

Other helpful text Anlytics and Natural Language Processing Python libraries

Release historyRelease notifications | RSS feed

0.1.5

0.1.4

0.1.3

0.1.2

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for TextLab, version 0.1.5
Filename, sizeFile typePython versionUpload dateHashes
Filename, size TextLab-0.1.5-py3-none-any.whl (36.6 kB) File type Wheel Python version py3 Upload dateHashes
Filename, size TextLab-0.1.5.tar.gz (33.2 kB) File type Source Python version None Upload dateHashes
Close

Hashes for TextLab-0.1.5-py3-none-any.whl

Textlabel.size

Hashes for TextLab-0.1.5-py3-none-any.whl
AlgorithmHash digest
SHA2564f9c9ea09ac82f45b605b5e2dc20a6ec11dfc84da7248ae0bfebedd27bac1890
MD5701dd2ff00bec37a2e4a7f88f8beb534
BLAKE2-256d9a88be0b72897f798942d3621fbebad1403de51a9d47e030f702936afa8ced6

Textlab04 Java Assignment

Close

Hashes for TextLab-0.1.5.tar.gz

Hashes for TextLab-0.1.5.tar.gz
AlgorithmHash digest
SHA2567fb17f325a0e2bcbe91b9fbf3b51253490561ebf8e10859bf6f87a6e82fe41d6
MD565574e64708e54aca77a6122a2add8ca
BLAKE2-2561bf26c09be280148baa4ea7738b96ec33a9ffba6e5ec88f5b2d4a8becf79c30b




Coments are closed