License

BSD 3-Clause License

Copyright (c) 2020, the kwx developers. All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

* Redistributions of source code must retain the above copyright notice, this
  list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above copyright notice,
  this list of conditions and the following disclaimer in the documentation
  and/or other materials provided with the distribution.

* Neither the name of the copyright holder nor the names of its
  contributors may be used to endorse or promote products derived from
  this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Change log

Changelog

kwx tries to follow semantic versioning, a MAJOR.MINOR.PATCH version where increments are made of the:

  • MAJOR version when we make incompatible API changes

  • MINOR version when we add functionality in a backwards compatible manner

  • PATCH version when we make backwards compatible bug fixes

kwx 1.0.0 (December 28th, 2021)

kwx 0.1.8 (April 29th, 2021)

Changes include:

  • Support has been added for gensim 3.8.x and 4.x

  • Dependencies in requirement and environment files are now condensed

  • An alert for users when the corpus size is to small for the number of topics was added

  • An import error for pyLDAvis was fixed

kwx 0.1.7.3 (March 30th, 2021)

Changes include:

  • Switching over to an src structure

  • Removing the lda_bert method because its dependencies were causing breaks

  • Code quality is now checked with Codacy

  • Extensive code formatting to improve quality and style

  • Bug fixes and a more explicit use of exceptions

  • More extensive contributing guidelines

  • Tests now use random seeds and are thus more robust

kwx 0.1.5 (March 15th, 2021)

Changes include:

  • Keyword extraction and selection are now disjointed so that modeling doesn’t occur again to get new keywords

  • Keyword extraction and cleaning are now fully disjointed processes

  • kwargs for sentence-transformers BERT, LDA, and TFIDF can now be passed

  • The cleaning process is verbose and uses multiprocessing

  • The user has greater control over the cleaning process

  • Reformatting of the code to make the process more clear

kwx 0.1.0 (Feb 17th, 2021)

First stable release of kwx

Changes include:

  • Full documentation of the package

  • Virtual environment files

  • Bug fixes

  • Extensive testing of all modules with GH Actions and Codecov

  • Code of conduct and contribution guidelines

kwx 0.0.2.2 (Jan 31st, 2021)

The minimum viable product of kwx:

  • Users are able to extract keywords using the following methods

    • Most frequent words

    • TFIDF words unique to one corpus when compared to others

    • Latent Dirichlet Allocation

    • Bidirectional Encoder Representations from Transformers

    • An autoencoder application of LDA and BERT combined

  • Users are able to tell the model to remove certain words to fine tune results

  • Support is offered for a universal cleaning process in all major languages

  • Visualization techniques to display keywords and topics are included

  • Outputs can be cleanly organized in a directory or zip file

  • Runtimes for topic number comparisons are estimated using tqdm