Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Consolidates NLP and AutoML, adds support for PyKX 2.5.3 with Python …
…3.11. (#109) * added link to documentation * added link to documentation * don't print load messages in quiet mode * docker image with nlp dependencies installed * build docker image on travisCI * update docker output * slack notification * updated README * updated README * updated README * removed finding years as there are too many false positives in findDates * adding tests * added tests * updated travis * updated paths * added tests * Squashed commit of the following: commit 86dde8886648f4c0199ca25696a68d97fecf30a7 Author: Fionnuala Carr <[email protected]> Date: Mon Jul 9 11:38:13 2018 +0100 cleaned up tests commit 2c3c612e70a92cd0d7c23a331198d9f351868c35 Author: Fionnuala Carr <[email protected]> Date: Mon Jul 9 11:11:48 2018 +0100 changed ranges of dates,q error commit 3828ac2c55c3ce0e89788362db527fb5305b1f60 Author: Fionnuala Carr <[email protected]> Date: Fri Jun 29 11:54:08 2018 +0100 modified test commit 32ed5525c697200cbd011826bcbeab0b3e9b9e8a Author: Fionnuala Carr <[email protected]> Date: Fri Jun 29 11:46:08 2018 +0100 moved tests commit aa56fbc3ba6459ee1d6df788c9f3890a6830fdb2 Author: Fionnuala Carr <[email protected]> Date: Fri Jun 29 11:36:58 2018 +0100 changed path commit 1b006a946bc00e195a216d317a60e3ec2d9f92fd Author: Fionnuala Carr <[email protected]> Date: Fri Jun 29 11:25:20 2018 +0100 cd back commit c83255288dfc0dfac3eca39bb636bc76015b1b5e Author: Fionnuala Carr <[email protected]> Date: Fri Jun 29 11:22:13 2018 +0100 test embedPy runs commit 1ad7c10a651eb64da0e00aa12de3f4101b269229 Author: Fionnuala Carr <[email protected]> Date: Fri Jun 29 10:50:21 2018 +0100 testing commit f9327754052c95f26414c25d232c9703f316206d Author: Fionnuala Carr <[email protected]> Date: Fri Jun 29 10:48:38 2018 +0100 embedpy commit a036ac2e51b9f42a63c8faf8e86c2691b38ea424 Author: Fionnuala Carr <[email protected]> Date: Fri Jun 29 10:46:34 2018 +0100 embed commit 2fcc4402bc6b6993b4e4735fe6415def84ecc368 Author: Fionnuala Carr <[email protected]> Date: Fri Jun 29 10:38:34 2018 +0100 embedPy commit 36cdbef8c469750bb0448688e7a767647b29d423 Author: Fionnuala Carr <[email protected]> Date: Fri Jun 29 10:33:48 2018 +0100 embedPy commit 2b84902e70ee84b863a76ac39844c6d3e0017e9d Author: Fionnuala Carr <[email protected]> Date: Fri Jun 29 10:30:28 2018 +0100 embedPy commit 66d3ab44d97b7f12ea2921fd15efb1b1c69df434 Author: Fionnuala Carr <[email protected]> Date: Fri Jun 29 10:25:04 2018 +0100 embedPy commit 58f6ce0413992c78ce05e250860b9eac2de1ae13 Author: Fionnuala Carr <[email protected]> Date: Fri Jun 29 10:20:45 2018 +0100 embedPy commit 905511d833ad29ffc5bb159c40b75c74a7788057 Author: Fionnuala Carr <[email protected]> Date: Fri Jun 29 10:17:59 2018 +0100 embedPy commit 50ec1d4d5189c703f9cb11c918a1cc19b85c0c62 Author: Fionnuala Carr <[email protected]> Date: Fri Jun 29 10:06:36 2018 +0100 test commit b9b4ec17333f59082fceee57d32914dd7f3217f9 Author: Fionnuala Carr <[email protected]> Date: Fri Jun 29 09:58:00 2018 +0100 test commit 7d6fcf7582c8681aaa0664d192b009d91262781b Author: Fionnuala Carr <[email protected]> Date: Fri Jun 29 09:48:15 2018 +0100 test commit bd6c713e62db8bfc0494a4b990a9ca91ff7e7d16 Author: Fionnuala Carr <[email protected]> Date: Fri Jun 29 09:16:32 2018 +0100 tests commit f701c088e83ed7b2243eef4c99618ce32ffd8626 Author: Fionnuala Carr <[email protected]> Date: Fri Jun 29 09:09:31 2018 +0100 removed builds commit 26d056251cc79a2c38356324dcfd3af345c2af55 Author: Fionnuala Carr <[email protected]> Date: Thu Jun 28 18:50:10 2018 +0100 add test commit 66ab09c98eeb142c7ea1b4a1f6a410a41a67dfba Author: Fionnuala Carr <[email protected]> Date: Thu Jun 28 18:38:51 2018 +0100 tests commit e23b746cf1aa96256429fc252aea7036e52a02d2 Author: Fionnuala Carr <[email protected]> Date: Thu Jun 28 18:29:01 2018 +0100 test commit b3e889493228411e6d1485dfefd25681e31b8899 Author: Fionnuala Carr <[email protected]> Date: Thu Jun 28 18:28:34 2018 +0100 test commit aecb40447e48957dcaaf8e102673915d2968e7ef Author: Fionnuala Carr <[email protected]> Date: Thu Jun 28 18:24:45 2018 +0100 tests commit 18f4417e0c0123981ea613d9ba4ffff61e9211b1 Author: Fionnuala Carr <[email protected]> Date: Thu Jun 28 18:18:22 2018 +0100 add tests commit 79c35c63e4bf647242258b3613e7d573b31883a8 Author: Fionnuala Carr <[email protected]> Date: Thu Jun 28 17:14:10 2018 +0100 non conda commit a6a22020ad545306509f926c965d96c7f092e82e Author: Fionnuala Carr <[email protected]> Date: Thu Jun 28 16:47:17 2018 +0100 testing commit 0cb7b2ddd0326b3504d5e2478b3e478c39582d94 Author: Fionnuala Carr <[email protected]> Date: Thu Jun 28 16:21:54 2018 +0100 testing commit 6e2d2493d3c07a9371a1c830e9038414b85976ad Author: Fionnuala Carr <[email protected]> Date: Thu Jun 28 16:19:02 2018 +0100 testing commit 867d432c080d3ae78b5fa6a90a050e883627c72e Author: Fionnuala Carr <[email protected]> Date: Thu Jun 28 16:17:35 2018 +0100 testing commit 8411f070ed3a791e3c92c3b1cb133fe24a14e03d Author: Fionnuala Carr <[email protected]> Date: Thu Jun 28 16:16:26 2018 +0100 testing commit 8325b17e1168781eddabd663cc255e9be8168d7d Author: Fionnuala Carr <[email protected]> Date: Thu Jun 28 16:14:39 2018 +0100 testing commit 23d17da19d06d4b0068f89a8e27c9e1747b907d1 Author: Fionnuala Carr <[email protected]> Date: Thu Jun 28 16:08:03 2018 +0100 testing commit 162387ee28a74144aee8e69a9253bc98e34a9b18 Author: Fionnuala Carr <[email protected]> Date: Thu Jun 28 16:05:19 2018 +0100 testing commit 14db1b68d22b4e86906022029b11fdf3e78cc1af Author: Fionnuala Carr <[email protected]> Date: Thu Jun 28 16:02:15 2018 +0100 testing commit 9e88a0c35980f232b1d5ab3098e484ee98617aa8 Author: Fionnuala Carr <[email protected]> Date: Thu Jun 28 15:52:46 2018 +0100 testing commit c423e96384c69277516be8a16880f9081beda129 Author: Fionnuala Carr <[email protected]> Date: Thu Jun 28 15:45:58 2018 +0100 testing commit 4be9539013403c0e35d67a17c4aee37040d99033 Author: Fionnuala Carr <[email protected]> Date: Thu Jun 28 15:44:35 2018 +0100 testing commit 4bfc0f1955cd4ff068c7cb71c6cf13e431622c6c Author: Fionnuala Carr <[email protected]> Date: Thu Jun 28 15:41:33 2018 +0100 testing commit 7687973cf4fce5101578458049cb2cf80788b54e Author: Fionnuala Carr <[email protected]> Date: Thu Jun 28 15:35:24 2018 +0100 testing commit 2892e37f735b5ea7e71807c10f2359247319f27e Author: Fionnuala Carr <[email protected]> Date: Thu Jun 28 15:30:05 2018 +0100 testing commit 12b25e0d4d702b38e80a24ed06f405c944faadf9 Author: Fionnuala Carr <[email protected]> Date: Thu Jun 28 13:38:14 2018 +0100 testing commit 7c609469090784d78bcc450f2fb15bc1ccf8c055 Author: Fionnuala Carr <[email protected]> Date: Thu Jun 28 12:25:36 2018 +0100 testing commit d162961f17a530bb795127d4523a8062b288ce5b Author: Fionnuala Carr <[email protected]> Date: Thu Jun 28 12:13:33 2018 +0100 testing commit d6018f7d3626fc94e9160833fe27b40ca3a9ae9f Author: Fionnuala Carr <[email protected]> Date: Thu Jun 28 11:49:42 2018 +0100 testing commit a67fa1348c811659aa63f8a33acebb007b8426af Author: Fionnuala Carr <[email protected]> Date: Thu Jun 28 09:59:38 2018 +0100 testing commit a48dc73c939c02950cadff7f9bcdadc57d6e7974 Author: Fionnuala Carr <[email protected]> Date: Thu Jun 28 09:40:29 2018 +0100 testing commit 3d728ebfc799dd8655f7cde8e149f0287dbececc Author: Fionnuala Carr <[email protected]> Date: Thu Jun 28 09:36:57 2018 +0100 testing commit a593150467bae8385a9c4cb2d7a4a458fb3fa504 Author: Fionnuala Carr <[email protected]> Date: Wed Jun 27 18:23:28 2018 +0100 testing commit 4258fdbbd342cc342f8021a313a872b95589a30b Author: Fionnuala Carr <[email protected]> Date: Wed Jun 27 17:17:50 2018 +0100 testing commit d2e2312dc17a69421952ad59af68ef29d84f9c34 Author: Fionnuala Carr <[email protected]> Date: Wed Jun 27 17:09:45 2018 +0100 testing commit c2d442d4bb1cc74f37507f3bf5f8662edb7948f5 Author: fionncarr <[email protected]> Date: Tue Jun 26 11:42:06 2018 +0000 tests commit 8c7cd25b3b46d779fbbfcdcdd6a7df89e06348e9 Author: fionncarr <[email protected]> Date: Tue Jun 26 11:16:08 2018 +0000 tests commit 254c6411183fb39074d8af603b6b28eef09581b5 Author: fionncarr <[email protected]> Date: Tue Jun 26 10:58:11 2018 +0000 tests commit 4423021a7dea2696a12f02e0dd9ecc74cd8bbbae Author: fionncarr <[email protected]> Date: Tue Jun 26 10:37:17 2018 +0000 test commit 3e63b5fc7d9d59b205b7554b35095598559c7fcf Author: fionncarr <[email protected]> Date: Tue Jun 26 10:04:56 2018 +0000 tests commit 8a34ed99cc31e397d8cc8be2af79b04754cc3662 Author: fionncarr <[email protected]> Date: Tue Jun 26 09:26:40 2018 +0000 updated tests commit 382ccd710860920a25df413b4e7b05321cc8bc9e Author: fionncarr <[email protected]> Date: Tue Jun 26 08:42:03 2018 +0000 checking what test fails commit 1859636d67a1ef35b8e798e4fe879f0e0019ed53 Author: fionncarr <[email protected]> Date: Mon Jun 25 17:33:10 2018 +0000 downloading spacy en model commit e6713fa0c3fc81a3391ad200a6f63b19c41eee86 Author: fionncarr <[email protected]> Date: Mon Jun 25 17:10:45 2018 +0000 changed email commit 200d7c9bba262cc5654028cbdabf189a721f13a8 Author: Fionnuala Carr <[email protected]> Date: Mon Jun 25 17:20:06 2018 +0100 deleted email commit 8915a135a9e176af8f8deb0811bdf5c40cab1e26 Author: Fionnuala Carr <[email protected]> Date: Mon Jun 25 17:07:13 2018 +0100 added email commit d0613bdc6d5ae1e1934809b0e379cbe440d057c7 Author: Fionnuala Carr <[email protected]> Date: Mon Jun 25 16:35:57 2018 +0100 updated path commit 3c10c52eca07e880f6c4a2ce791a8c54cd4ab28d Author: Fionnuala Carr <[email protected]> Date: Mon Jun 25 16:13:25 2018 +0100 pip not pip3 commit 0a5ce933611647663d0f0b01d472b02b963aa28a Author: Fionnuala Carr <[email protected]> Date: Mon Jun 25 16:10:37 2018 +0100 remove comments commit 5cd62ff1eabc7ae37c07f9c0c29bc674438191d4 Author: Fionnuala Carr <[email protected]> Date: Mon Jun 25 16:06:41 2018 +0100 test with kdb commit a560aef96e1747a746ac1bf12bd6b2d271d4972b Author: Fionnuala Carr <[email protected]> Date: Mon Jun 25 15:34:37 2018 +0100 osx and linux * decoded evn vars * updated path for vader * modified load emails function, added tests for the function and added scripts to spread expensive computations * checked where it was breaking * removed test that it breaks on * adding scripts to make a release of the code and fixed bug in the parser with lemmas * corrected typo * recommiting because docker doesnt work * fixed bug that was causing findDates to crash, lemmas return correct result and jaroWinkler works for 3 or less lettered words. Created tested for these functions * delete files * reversed some of the files so it would be able to merge to master * added new functionality to extract rtf text in an email * changed tests to accommodate new format in emails * fixed .nlp.loadfile function to allow it to load in files in windows * unsilence tests * silenced tests * fixed tfidf to match python * added alternative apostrophe to stopword contractions * updated TFIDF code * added ability for alpha languages * deleted stopwords * tests * changed test files * test.q now the same as embedpy test.q * testing * alpha languages support,detect languages function added and TFIDF update * buildtest * buildtest * buildtest * buildtest * test * test * test * test * test * test * test * added conda cmd * added conda cmd * added conda cmd * a * changed back to original test.q * test * test * test * test * test * test * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update2 * update2 * update3 * update3 * update3 * update3 * update3 * update3 * update3 * update3 * update3 * update3 * update3 * update3 * update3 * update3 * update3 * update3 * update4 * update5 * update5 * update5 * update6 * update6 * update6 * update6 * update6 * update7 * update8 * update9 * update9 * update10 * update11 * update12 * update13 * update14 * update15 * update16 * update17 * update18 * update18 * update19 * update19 * update * added user: to curl function * added user: to curl function * commit * getToFrom function now looks for multiple senders if payload is a table * change appveyor settings * no changes * calls embedpy tests * calls embedpy tests * added in PennPOS to catch symbols * loading init.q added to test scripts * delete test.q * testing * testing * testing * laoding init.q added to test scripts * dev (#15) * tests * changed test files * test.q now the same as embedpy test.q * testing * alpha languages support,detect languages function added and TFIDF update * buildtest * buildtest * buildtest * buildtest * test * test * test * test * test * test * test * added conda cmd * added conda cmd * added conda cmd * a * changed back to original test.q * update * added user: to curl function * added user: to curl function * commit * getToFrom function now looks for multiple senders if payload is a table * no changes * calls embedpy tests * calls embedpy tests * added in PennPOS to catch symbols * loading init.q added to test scripts * added slack notification * fixed tests for spacy update * fixed tests for spacy update * dev * fixed tests for spacy update * Dianedev (#18) (#19) * tests * changed test files * test.q now the same as embedpy test.q * testing * alpha languages support,detect languages function added and TFIDF update * buildtest * buildtest * buildtest * buildtest * test * test * test * test * test * test * test * added conda cmd * added conda cmd * added conda cmd * a * changed back to original test.q * update * added user: to curl function * added user: to curl function * commit * getToFrom function now looks for multiple senders if payload is a table * change appveyor settings * no changes * calls embedpy tests * calls embedpy tests * added in PennPOS to catch symbols * loading init.q added to test scripts * delete test.q * testing * testing * testing * laoding init.q added to test scripts * added slack notification * fixed tests for spacy update * fixed tests for spacy update * dev * fixed tests for spacy update * added spacy hunspell * removed .i. from funcs in docs * removed .i. from funcs in docs * removed .i. from funcs in docs * removed .i. from funcs in docs * removed .i. from funcs in docs * removed .i. from funcs in docs * embedPy * code.kx link updates * updated spell check * updated spell check * cleaned up format * update file path for init * updated init.q * fixed spacy_hunspell * install spacy_hunspell error * fixed travis and appveyor * added regex funcs * added tensorflow funcs * added alpha lang instructions, fixed parser for single opts input * added tf tests * clean up and added tests * removed tensorflow, updated langid * added instructions for spacy_hunspell * updated travis file * add conda-forge to docker * changed docker instructions * added pip -q to travis file w * run tests on spacy version 2.2.1 * fix merging (#22) * Initial commit * This commit includes the initial beta version of the automated machine learning framework. * Can be used for Normal/FRESH tasks in regression or classification problem. * Designed to be flexible in nature to kdb devs and ml engineers * Includes testing procedures via travis and appveyor * Addition of hidden travis yml file and issue templates * Update to code commenting, removal of unneeded plotting functions, minor readme mod * Code refactor to move to dict input and clean up aml.q (#2) * Update README.md * Update README.md * Fix to 'locals error and addition of target check (#4) * reduction in local variable to stop 'locals error in kdb+<3.6 * Update to number of locals in aml.q, addition of target check and pandas requirement * Change to target encoding location for symbols * Removal of 3.6 requirement * Train-test-split naming and check for existence of save-default file (#5) * reduction in local variable to stop 'locals error in kdb+<3.6 * Update to number of locals in aml.q, addition of target check and pandas requirement * Change to target encoding location for symbols * Removal of 3.6 requirement * Update to conventions for train-validate-test, check to see if default file already exists * Fix to link in contributing.md * Addition of travis and release tags * Update .travis.yml * Update package.bat * Update getkdb.bat * Update getkdb.bat * Update package.bat * Update to docker image (#6) * Docker image had not been initialising correctly and was missing the ml toolkit * wording update (#7) * Update (#23) * update merge * update function name * Update to findDates func to check for the word "of" or "in" between dates, months or years. Tests also added to account for this change (#24) * update merge * update function name * fixed findDates function to account for the word of or in between dates, months and years * update to infinity replace logic (#8) * Explicit closing of figures to reduce process memory usage (#9) * Upd infreplace (#10) * update to infinity replace logic * fix to bug in infinity replace for float * Update travis/appveyor files. Removed sys argv statement due to embedPy update (#25) * update merge * update function name * fixed findDates function to account for the word of or in between dates, months and years * removed sys argv statement * fix appveyor and travis files * fix copy error * removed getembedpy * v0.2.0 additions (#11) * addition of latex support and torch functionality * merged nlp into new version * update to import and checking functionality * removal of hack for save paths * commenting to init,no longer defining the .automml.p namespace, functions won't be callable unless they're all available anyway * removal of unnecessary type check, more readable choice of first element, checknlp -> validnlp, or not and for check * space between separate columns, rename of util to prep. ... * refactoring of nlp preprocessing execution * splitting of preprocessing functions into sub folders, models folder now splits into sections * null and constant drop, simplification of percentage calculation in stop tab function * wording update * Explicit closing of figures to reduce process memory usage * random/sobol hyperparam search * grid/random hyperparam files * old hyperparam file deleted * kneighbors fix * update to hyperparameter generation functionality for automl * random search moved to ml * fix for reading in pdict with new hp key * tests + fix to proc.hp.psearch for sobol * clearer install instructions for additional modules * latex requirements * readme updates * report feature extraction * python checks * fixed nlp terminal printouts * nlp refactor * added regex search and user config option for w2v * word2vec type change. Fixed string error * added NLP tests * installed extra requirements for tests * moved requirements * updated scripts to support pytorch models without tensorflow * fix to error in non latex report generation * updated travis/appveyor files * saving with failing latex generation results in movement to report directory without rectification * typo in second system command for current directory * updated for mproc loading to work with both keras and torch * V2nlp review (#10) * pytorch updates * pytorch updates * refactor to ensure file movement is correct, failing tests are caught and torch tests can be ran * pip install for appveyor tests Co-authored-by: Conor McCarthy <[email protected]> Co-authored-by: Conor McCarthy <[email protected]> * added namespace (#13) * added namespace * added warning for w2v randomization. Added w2vitem function that was previously redundant * update to pip install rather than conda in docker images Co-authored-by: Dianeod <[email protected]> Co-authored-by: Deanna Morgan <[email protected]> Co-authored-by: dmorgankx <[email protected]> Co-authored-by: Conor McCarthy <[email protected]> Co-authored-by: Conor McCarthy <[email protected]> Co-authored-by: Dianeod <[email protected]> Co-authored-by: Conor McCarthy <[email protected]> * Testing minor changes (#12) * addition of latex support and torch functionality * merged nlp into new version * update to import and checking functionality * removal of hack for save paths * commenting to init,no longer defining the .automml.p namespace, functions won't be callable unless they're all available anyway * removal of unnecessary type check, more readable choice of first element, checknlp -> validnlp, or not and for check * space between separate columns, rename of util to prep. ... * refactoring of nlp preprocessing execution * splitting of preprocessing functions into sub folders, models folder now splits into sections * null and constant drop, simplification of percentage calculation in stop tab function * wording update * Explicit closing of figures to reduce process memory usage * random/sobol hyperparam search * grid/random hyperparam files * old hyperparam file deleted * kneighbors fix * update to hyperparameter generation functionality for automl * random search moved to ml * fix for reading in pdict with new hp key * tests + fix to proc.hp.psearch for sobol * clearer install instructions for additional modules * latex requirements * readme updates * report feature extraction * python checks * fixed nlp terminal printouts * nlp refactor * added regex search and user config option for w2v * word2vec type change. Fixed string error * added NLP tests * installed extra requirements for tests * moved requirements * updated scripts to support pytorch models without tensorflow * fix to error in non latex report generation * updated travis/appveyor files * saving with failing latex generation results in movement to report directory without rectification * typo in second system command for current directory * updated for mproc loading to work with both keras and torch * V2nlp review (#10) * pytorch updates * pytorch updates * refactor to ensure file movement is correct, failing tests are caught and torch tests can be ran * pip install for appveyor tests Co-authored-by: Conor McCarthy <[email protected]> Co-authored-by: Conor McCarthy <[email protected]> * added namespace (#13) * added namespace * added warning for w2v randomization. Added w2vitem function that was previously redundant * update to pip install rather than conda in docker images * addition of smaller run of trials for sobol/random * reduction in number of rows for nlp tests (timeout) Co-authored-by: Dianeod <[email protected]> Co-authored-by: Deanna Morgan <[email protected]> Co-authored-by: dmorgankx <[email protected]> Co-authored-by: Conor McCarthy <[email protected]> Co-authored-by: Conor McCarthy <[email protected]> Co-authored-by: Dianeod <[email protected]> Co-authored-by: Conor McCarthy <[email protected]> * NLP dataset to allow reduced rows * AutoML Refactor Version 0.3.0 (#14) * Version 0.3.0 update (#13) * addition of latex support and torch functionality * merged nlp into new version * update to import and checking functionality * removal of hack for save paths * commenting to init,no longer defining the .automml.p namespace, functions won't be callable unless they're all available anyway * removal of unnecessary type check, more readable choice of first element, checknlp -> validnlp, or not and for check * space between separate columns, rename of util to prep. ... * refactoring of nlp preprocessing execution * splitting of preprocessing functions into sub folders, models folder now splits into sections * null and constant drop, simplification of percentage calculation in stop tab function * First pass commit at automl code structure with new graphing mechanism * full graph in new format which can run 'basic' .automl.run * Addition of stub files for AutoML graph testing * update to travis test code * first pass at data ingestion, configuration creation and data checking * addition of save path to config, additional checking for NLP * update to config retrieval to support flat files, coinciding refactor of function * renaming of nlp checks to be clearer * removal of overwritten date/time and update to structure/commenting * change to camelCase, addition of image graphs, change to structure * First pass update to include new coding standard definitions * addition of a common location for general use utilities * removal of unnecessary hidden files * Addition of tests for target data functionality * variable -> variant * tests for process based retrieval of feature data * addition of appropriate tests for the dataCheck node * minor updates * update to graph, inclusion of label encodeing symbol mapping to graph both code and images * addition of tests for remaining function in dataCheck node * review of targetData, featureData and dataCheck * added labelEncode functionality and corresponding tests * Initial addition of now renamed featureDescription node, update to graph images * change from modification to description in node naming * removal of unneeded param to dataDescription function, update to tests to cover all expected behaviour * update to automl graph to use new label encode function from the toolkit * added node for modelGeneration. Added customization folder for models, scoring funcs etc * addition of funcs.q * updated comments from review of code * Minor improvements to code * changes to keras functions to make it more adaptable for the addition of other models * windows fix for updateConfig (#11) * windows fix for updateConfig - no longer overwrites dir * code tidy up * Addition of dataPreprocessing node (#12) * addition of dataPreprocessing node * cleaned up commenting * updated review changes * minor updates to dataPreprocessing functionality, models definitions updated for keras.q Co-authored-by: Conor McCarthy <[email protected]> * New commenting style required for featureDescription node (#19) * windows fix for updateConfig - no longer overwrites dir * code tidy up * variable declared as global broke tests - changed to local * new commenting style * Feature creation node (#16) * windows fix for updateConfig - no longer overwrites dir * code tidy up * variable declared as global broke tests - changed to local * addition of dataPreprocessing node * cleaned up commenting * updated review changes * minor updates to dataPreprocessing functionality, models definitions updated for keras.q * added featurecreation functionality * cleaned up nlp functions * cleaned up code * updated graph for feat create model, added test print statements. Added travis/appveyor PYTHONHASHSEED * updated appveyor build scripts to install embedpy via conda * pythonhashseed env * code review and test changes * removal of old testing data * added tests and error trap for NLP and pulled down review * changed NLP tests * updated NLP tests to use spacy 2.3.2 * updated code in line with comments and added tests for ml.df2tab addition Co-authored-by: Deanna Morgan <[email protected]> Co-authored-by: Conor McCarthy <[email protected]> * Selectmodels node (#18) * windows fix for updateConfig - no longer overwrites dir * code tidy up * variable declared as global broke tests - changed to local * added selectModels node * included funcs.q * test fixes * updated any comments in PR. Pulled down latest version Co-authored-by: Deanna Morgan <[email protected]> * addition of predictParams node (#21) * addition of predictParams node * graph updates Co-authored-by: Conor McCarthy <[email protected]> * Created pathConstruct node (#23) * addition of paramConsolidate node * created pathConstruct node * Automl graph tts (#15) * windows fix for updateConfig - no longer overwrites dir * code tidy up * variable declared as global broke tests - changed to local * addition of featureSignificance node * addition of trainTestSplit node * addition of featureSignificance node * sigFeat fixes * sigFeat error trapping * sigFeat tests * change of train test split output type * train test split tests * correction to featSig tests * correction to featSig tests * correction to featSig tests * test updates * test updates * correlated columns * review of tts * review of sigfeat * correction to sigFeats functions to include one of correlation columns * addition of tests for funcs.q * addition of q/python func check * review of tts, moved qpyFuncSearch to dataCheck * reviewed featSig tests * utils moved to funcs.q for TTS + sz check added * utils moved to funcs.q for TTS + sz check added * removed pythonTTS.p - already in dataCheck * PR changes Co-authored-by: Dianeod <[email protected]> * Addition of saveGraph node (#24) * addition of saveGraph node * addition of saveGraph node * addition of extra plots * removed folders created in tests * review of comments made * updated Graph * moved plt to utils, changed marker size Co-authored-by: cmccarthy1 <[email protected]> * Graph runmodels (#17) * windows fix for updateConfig - no longer overwrites dir * code tidy up * variable declared as global broke tests - changed to local * addition of featureSignificance node * addition of trainTestSplit node * addition of featureSignificance node * sigFeat fixes * sigFeat error trapping * sigFeat tests * change of train test split output type * train test split tests * correction to featSig tests * correction to featSig tests * correction to featSig tests * test updates * test updates * correlated columns * review of tts * review of sigfeat * updated * addition of runmodels node * updated graph * runModels review * added number of reps for gs/xv * updated dataCheck test * updated comments made in PR * addition of information for metadata * resolved all comments Co-authored-by: Deanna Morgan <[email protected]> Co-authored-by: dmorgankx <[email protected]> * Automl graph preproc params (#26) * windows fix for updateConfig - no longer overwrites dir * code tidy up * variable declared as global broke tests - changed to local * Update Automl_Graph.drawio * graph, test and code format updates * test print statements and updated graph * Update Automl_Graph.drawio * graph updates * Connection abd workflow clarity for graph images Co-authored-by: Conor McCarthy <[email protected]> * Addition of saveMeta node (#25) * windows fix for updateConfig - no longer overwrites dir * code tidy up * variable declared as global broke tests - changed to local * new commenting style * addition of saveMeta node * addition of saveopt check and tests * node review - moved mdlMeta to funcs, removed repeated code * added print statements. Removed pathDict created in pathConstruct * updated modelMeta lib * Addition of tests to check paths/metadata is created Co-authored-by: Deanna Morgan <[email protected]> * OptimizeModels Node (#20) * windows fix for updateConfig - no longer overwrites dir * code tidy up * variable declared as global broke tests - changed to local * new commenting style * added optimization node * update optimization node * updated graph * Update to include confusion matrix and impact dictionary * added regression calculation * node review Co-authored-by: Deanna Morgan <[email protected]> * Fixed any bugs found (#27) * windows fix for updateConfig - no longer overwrites dir * code tidy up * variable declared as global broke tests - changed to local * new commenting style * added optimization node * update optimization node * updated graph * Update to include confusion matrix and impact dictionary * added regression calculation * node review * fixed any bugs found, nor runs through for all nodes Co-authored-by: Deanna Morgan <[email protected]> * Moved testing functions to separate file (#29) * moved passing/failing test to seperate file * added test/utils.q * Addition of saveModel node (#28) * windows fix for updateConfig - no longer overwrites dir * code tidy up * variable declared as global broke tests - changed to local * new commenting style * addition of saveModels node * addition of savemodels node * addition of saveModels node * clearned up if statement * addition of saveModels node * added tests for NLP Co-authored-by: Deanna Morgan <[email protected]> * Updating any bugs so that .`automl.run` works (#30) * removed duplicated function * fixed any errors to make sure automl.run runs through * updated travis to include spacy english model and keras * Automl graph savereport (#31) * windows fix for updateConfig - no longer overwrites dir * code tidy up * variable declared as global broke tests - changed to local * new commenting style * addition of saveModels node * addition of savemodels node * addition of saveModels node * clearned up if statement * minor code changes * addition of fpdf report gen * addition of fpdf report gen * updated saveGraph to run latex * report tests * report tests * updated tests * updated image size * latex formatting * Updated latex checking and change to code organization * pdflatex naming typo * Fix to force absolute location of generated reports * update to reportlab generation, new page removes custom font for headers Co-authored-by: Dianeod <[email protected]> Co-authored-by: Conor McCarthy <[email protected]> Co-authored-by: Conor McCarthy <[email protected]> * removed duplicated function (#33) * move hyperparams to json files (#34) * removed duplicated function * move hyperparams to json files * Update to some descriptions of functions and indenting of json Co-authored-by: Conor McCarthy <[email protected]> * Add functionality to add custom save path (#35) * removed duplicated function * added functionality to add custom model save path * move hyperparams to json files (#34) * removed duplicated function * move hyperparams to json files * Update to some descriptions of functions and indenting of json Co-authored-by: Conor McCarthy <[email protected]> * Update to be more strongly typed Co-authored-by: Conor McCarthy <[email protected]> * Introduction of command line interface api for automl (#36) * Initial pass at json driven command line interfacce * Major update to command line interface to support new input naming and allow first pass at fire and forget * update to allow data retrieval via ipc/csv in command line case * update to json format and command line input structure * addition of code commenting for new command line version * Final change to facilitate appropriate model naming conventioj * typo fix * Review of code (#37) * removed duplicated function * review of code * refactor default layout * Reintroduction of prediction mechanism for automl (#38) * first pass at addition of prediction functionality * Working pass at retrieval of models from disk * update to remove multiple paths to generate predict function * revert to pre cli_testing merge * Minor updates to clean up NLP and correctly retrieve saved model. Update to feature creation for FRESH to support tabular input * minor fixes to issues with retrieving named models and using the correct save option name * Update tests (#40) * windows fix for updateConfig - no longer overwrites dir * code tidy up * variable declared as global broke tests - changed to local * new commenting style * addition of saveModels node * addition of savemodels node * addition of saveModels node * clearned up if statement * minor code changes * update to tests to be in line with new dictionary input structure * updates in line with requirements for tests with dataPreprocessing node * featureExtractionType across the board change * Update to tests for featureCreation node * update to featureExtractionType name * update to featureSignificance node tests to account for new config structure * update to configuration retrieval to ensure full config retrieved * renaming of config parameters for model optimization and update to feature extraction naming for preprocParams node * update to runModels test config * updates to configurations for train test split node * Fix to bug in data split function and renaming of configuration in testing in line with new functionality * update to saveMeta testing to align with revised structure for prediction functionality * update to saveoption and feature extraction naming in line with new config for saveModels node * Update to configuration for testing of saveReport node * removal of old config definition * Fix to bug introduced with change to hyperparameter function retrieval, update to configuration keys * path error fix and model meta check * reintroduction of test utilities needed for passing/failing test logic * Review of updateTests branch (#42) * removed duplicated function * review of code * refactor default layout * review of testing code * Reintroduction of travis testing (#43) * initial update to reintroduce tests * reintroduction of tensorflow install requirement * Change to number of features and minor change to model paths * update to FRESH data to align with correct representation Co-authored-by: Conor McCarthy <[email protected]> Co-authored-by: Deanna Morgan <[email protected]> Co-authored-by: Dianeod <[email protected]> Co-authored-by: Conor McCarthy <[email protected]> Co-authored-by: Dianeod <[email protected]> * Automl scoringmodels (#46) * windows fix for updateConfig - no longer overwrites dir * code tidy up * variable declared as global broke tests - changed to local * new commenting style * addition of saveModels node * addition of savemodels node * addition of saveModels node * clearned up if statement * minor code changes * update to tests to be in line with new dictionary input structure * updates in line with requirements for tests with dataPreprocessing node * featureExtractionType across the board change * Update to tests for featureCreation node * update to featureExtractionType name * update to featureSignificance node tests to account for new config structure * update to configuration retrieval to ensure full config retrieved * renaming of config parameters for model optimization and update to feature extraction naming for preprocParams node * update to runModels test config * updates to configurations for train test split node * Fix to bug in data split function and renaming of configuration in testing in line with new functionality * update to saveMeta testing to align with revised structure for prediction functionality * update to saveoption and feature extraction naming in line with new config for saveModels node * Update to configuration for testing of saveReport node * removal of old config definition * Fix to bug introduced with change to hyperparameter function retrieval, update to configuration keys * move to json structure for models * model text files not needed * json additions for models and scoring * scoring json file * apply flag/boolean seed/scoring fixes/docs link * test fixes * test fixes Co-authored-by: Dianeod <[email protected]> Co-authored-by: Conor McCarthy <[email protected]> * review of predict node (#44) * removed duplicated function * review of code * refactor default layout * review of predict function * addition of warning if model comes from an unsupported library Co-authored-by: Conor McCarthy <[email protected]> * reintroduction of load for test utils in saveModels * Adding printing Functionality (#45) * removed duplicated function * added functionality to add custom model save path * Update to be more strongly typed * added print statements * removed file * moved remaining print statements to new format. Added print python warning option * cleaned up printing dict * fixed naming convention * updated naming convention. Adding additional logging parameter * moved api functionality to utils * Updates to clean up ordering of printing, allow logging directories/files to be modified in json definitions, update graph as required and add print for graph file locations Co-authored-by: Conor McCarthy <[email protected]> Co-authored-by: Conor McCarthy <[email protected]> Co-authored-by: Conor McCarthy <[email protected]> Co-authored-by: Deanna Morgan <[email protected]> Co-authored-by: Dianeod <[email protected]> Co-authored-by: Dianeod <[email protected]> Co-authored-by: dmorgankx <[email protected]> Co-authored-by: Conor McCarthy <[email protected]> Co-authored-by: Conor McCarthy <[email protected]> Co-authored-by: Dianeod <[email protected]> Co-authored-by: Deanna Morgan <[email protected]> Co-authored-by: Dianeod <[email protected]> Co-authored-by: dmorgankx <[email protected]> Co-authored-by: Conor McCarthy <[email protected]> * Graph warning (#47) * removed duplicated function * Initial pass at json driven command line interfacce * Major update to command line interface to support new input naming and allow first pass at fire and forget * update to allow data retrieval via ipc/csv in command line case * update to json format and command line input structure * addition of code commenting for new command line version * Final change to facilitate appropriate model naming conventioj * typo fix * review of code * refactor default layout * first pass at addition of prediction functionality * Review of code (#37) * removed duplicated function * review of code * refactor default layout * Working pass at retrieval of models from disk * update to remove multiple paths to generate predict function * revert to pre cli_testing merge * Minor updates to clean up NLP and correctly retrieve saved model. Update to feature creation for FRESH to support tabular input * minor fixes to issues with retrieving named models and using the correct save option name * review of predict function * Update tests (#40) * windows fix for updateConfig - no longer overwrites dir * code tidy up * variable declared as global broke tests - changed to local * new commenting style * addition of saveModels node * addition of savemodels node * addition of saveModels node * clearned up if statement * minor code changes * update to tests to be in line with new dictionary input structure * updates in line with requirements for tests with dataPreprocessing node * featureExtractionType across the board change * Update to tests for featureCreation node * update to featureExtractionType name * update to featureSignificance node tests to account for new config structure * update to configuration retrieval to ensure full config retrieved * renaming of config parameters for model optimization and update to feature extraction naming for preprocParams node * update to runModels test config * updates to configurations for train test split node * Fix to bug in data split function and renaming of configuration in testing in line with new functionality * update to saveMeta testing to align with revised structure for prediction functionality * update to saveoption and feature extraction naming in line with new config for saveModels node * Update to configuration for testing of saveReport node * removal of old config definition * Fix to bug introduced with change to hyperparameter function retrieval, update to configuration keys * path error fix and model meta check * reintroduction of test utilities needed for passing/failing test logic * Review of updateTests branch (#42) * removed duplicated function * review of code * refactor default layout * review of testing code * Reintroduction of travis testing (#43) * initial update to reintroduce tests * reintroduction of tensorflow install requirement * Change to number of features and minor change to model paths * update to FRESH data to align with correct representation Co-authored-by: Conor McCarthy <[email protected]> Co-authored-by: Deanna Morgan <[email protected]> Co-authored-by: Dianeod <[email protected]> Co-authored-by: Conor McCarthy <[email protected]> Co-authored-by: Dianeod <[email protected]> * add capability to ignore warnings/error statements * Automl scoringmodels (#46) * windows fix for updateConfig - no longer overwrites dir * code tidy up * variable declared as global broke tests - changed to local * new commenting style * addition of saveModels node * addition of savemodels node * addition of saveModels node * clearned up if statement * minor code changes * update to tests to be in line with new dictionary input structure * updates in line with requirements for tests with dataPreprocessing node * featureExtractionType across the board change * Update to tests for featureCreation node * update to featureExtractionType name * update to featureSignificance node tests to account for new config structure * update to configuration retrieval to ensure full config retrieved * renaming of config parameters for model optimization and update to feature extraction naming for preprocParams node * update to runModels test config * updates to configurations for train test split node * Fix to bug in data split function and renaming of configuration in testing in line with new functionality * update to saveMeta testing to align with revised structure for prediction functionality * update to saveoption and feature extraction naming in line with new config for saveModels node * Update to configuration for testing of saveReport node * removal of old config definition * Fix to bug introduced with change to hyperparameter function retrieval, update to configuration keys * move to json structure for models * model text files not needed * json additions for models and scoring * scoring json file * apply flag/boolean seed/scoring fixes/docs link * test fixes * test fixes Co-authored-by: Dianeod <[email protected]> Co-authored-by: Conor McCarthy <[email protected]> * review of predict node (#44) * removed duplicated function * review of code * refactor default layout * review of predict function * addition of warning if model comes from an unsupported library Co-authored-by: Conor McCarthy <[email protected]> * cleaned up code * added more verbose warnings. Changed the location of removal of previous savePaths * Update to reverse ordering of warning levels, fixes to deletion logic for tests, cfg->config Co-authored-by: Conor McCarthy <[email protected]> Co-authored-by: cmccarthy1 <[email protected]> Co-authored-by: Deanna Morgan <[email protected]> Co-authored-by: dmorgankx <[email protected]> Co-authored-by: Conor McCarthy <[email protected]> * addition of fit/predict functionality and fix to retrieval of models based on name * Revert "addition of fit/predict functionality and fix to retrieval of models based on name" This reverts commit 0e70c17b00aa48af545d892190e235daf46d6af4. * Addition of Theano capability (#48) * removed duplicated function * added capability for adding a Theano model * Update to Theano model support to remove models and allow run to continue if theano not installed * added theano model check. Cleaned up printWarnings dict. Fixed print to screen check if saveOpt is 0 Co-authored-by: Conor McCarthy <[email protected]> * Reintroduction of fit-predict tests and fix to named model retrieval (#49) * addition of fit-predict tests and fix to retrieval of named models * Graph testing upd (#50) * removed duplicated function * added print statements, included all test files in a bat file * Changed txt file to bat file in travis Co-authored-by: cmccarthy1 <[email protected]> Co-authored-by: Conor McCarthy <[email protected]> Co-authored-by: Dianeod <[email protected]> * Overall code review (#51) * windows fix for updateConfig - no longer overwrites dir * code tidy up * variable declared as global broke tests - changed to local * new commenting style * addition of saveModels node * addition of savemodels node * addition of saveModels node * clearned up if statement * minor code changes * code review * code review * code review * code review * code review * code review * code review * code review * fixes to dataCheck tests * test updates for windows * test fixes * conflict fixes * review of changes to overall codebase * minor change to selectModels test Co-authored-by: Dianeod <[email protected]> Co-authored-by: Conor McCarthy <[email protected]> * Graph log warning tests (#52) * removed duplicated function * added logging tests * addition of warning/theano/torch tests * fix for appveyor and travis tests * fixed appveyor build * removed swp file * updated ignorewarnings print statement * minor updates, torch change required for non gpu install torch Co-authored-by: Conor McCarthy <[email protected]> * Addition of retrieval logic to get nearest model based on start date (#53) * removed duplicated function * Initial pass at retrieval of closest model * added capability for adding a Theano model * Update to Theano model support to remove models and allow run to continue if theano not installed * Addition of model deletion functionality * removal of code duplication * Graph delete models (#55) * added logging tests * addition of warning/theano/torch tests * fix for appveyor and travis tests * fixed appveyor build * removed swp file * updated ignorewarnings print statement * minor updates, torch change required for non gpu install torch * review of code * fix delete models * fix for getModels using time Co-authored-by: Conor McCarthy <[email protected]> * addition of command line interface test and addition of test flag for running cli automl (#54) Co-authored-by: Conor McCarthy <[email protected]> * Graph fix misc (#57) * fixed logging andw warning tests;Updated README; Check for wrong input * revert changed to TF print * changed date/time to original format * cleaned up code * cleaned up code * Graph tests (#58) * reduced tests for appveyor timeout * reduced number of iterations for Theano Co-authored-by: Dianeod <[email protected]> Co-authored-by: Conor McCarthy <[email protected]> Co-authored-by: Dianeod <[email protected]> Co-authored-by: Dianeod <[email protected]> Co-authored-by: Dianeod <[email protected]> Co-authored-by: dmorgankx <[email protected]> Co-authored-by: Conor McCarthy <[email protected]> Co-authored-by: Deanna Morgan <[email protected]> Co-authored-by: Conor McCarthy <[email protected]> * Update requirements.txt Co-authored-by: Dianeod <[email protected]> Co-authored-by: Dianeod <[email protected]> Co-authored-by: dmorgankx <[email protected]> Co-authored-by: Conor McCarthy <[email protected]> Co-authored-by: Deanna Morgan <[email protected]> Co-authored-by: Conor McCarthy <[email protected]> * Update README.md * minor change to binary search for model retrieval by date-time, update to allow symbol saveModelName, minor path printing issue * Addition of recursive deleted function (#15) * added recursive deleted function * removed keyPath * fixed travis issue for mac * added check for deleting relevant dates * Change order of remove constant and add nulls (#16) * Change order of remove constant and add nulls * swap order of function constant values and null values Co-authored-by: unknown <Andrew Morrison> * Refactor of NLP library (#26) * update merge * update function name * fixed findDates function to account for the word of or in between dates, months and years * removed sys argv statement * fix appveyor and travis files * fix copy error * removed getembedpy * cluster refactor * fixed indentations * fixed @ * Update of date_time.q to new format * update email to new format and commenting style * Fix commenting error * review of parser * fix email error * fixed bug * updated comments * update commenting * updated comments * review of parser code * Updates to move utils to .i, removal of duplicate email function definitions * moved callable functions to the end * moved callable functions to the end * Minor consistency update * moved python funcs * review of regex function * Updates to parser functionality * Minor updates to regex string matching refactor * review of sent * fix indentation * fixed length of line to be <80 in regex * review of utils functions * fixed indentation * initial review of nlp_code * moved functions to nlp_code.q * Minor changes to sentiment analysis functionality * renamed files * minor description updates for nlp utilities * reintroduction of embedPy load * updated removeMain and added filelength.t * minor updates to coincide with docs * update to coincide with docs * changed input names * update comments * nlp code review qdocs and headers * updates following comments * adding dictionarys kind and type * two small changes Co-authored-by: Conor McCarthy <[email protected]> Co-authored-by: Conor McCarthy <[email protected]> Co-authored-by: Conor McCarthy <[email protected]> Co-authored-by: Conor McCarthy <[email protected]> Co-authored-by: andrewmorrison1 <[email protected]> * Update to reflect ML Toolkit refactor (#17) * update to new ml format * update ml functions for new refactor * feb 3rd automl code review * march 3rd code review * update to infreplace * response to comments on automl * update test to run on windows * changes after comments part 2 * second review of comments * reply to latest comments * review * predict - > transform * resolved comments Co-authored-by: andrewmorrison1 <[email protected]> * add sharpe ratio (#19) * imported documentation from code.kx.com (#20) * import documentation from code.kx.com, adapt links; converted to GFM * minor edits and fixes * move ml to subfolder * Support for pykx 2.5.3 & python 3.11 * restructure cleanup * support pykx & embedpy loaded before shim.q * Support loading mproc before .ml namespace * Fix automl output graph labeling * Add examples & rework READMEs * Add links to main README * Update old links * more links * used pinned requirements in dockerfile * remove .gitlab-ci, move shim to ml & deffer docker content * avoid shim.q dependency --------- Co-authored-by: Fionnuala Carr <[email protected]> Co-authored-by: James Hanna <[email protected]> Co-authored-by: jhanna-kx <[email protected]> Co-authored-by: fionncarr <[email protected]> Co-authored-by: Dianeod <[email protected]> Co-authored-by: diane <[email protected]> Co-authored-by: awilson-kx <[email protected]> Co-authored-by: cmccarthy1 <[email protected]> Co-authored-by: awilson-kx <[email protected]> Co-authored-by: Conor McCarthy <[email protected]> Co-authored-by: Deanna Morgan <[email protected]> Co-authored-by: dmorgankx <[email protected]> Co-authored-by: Conor McCarthy <[email protected]> Co-authored-by: Conor McCarthy <[email protected]> Co-authored-by: Conor McCarthy <[email protected]> Co-authored-by: Conor McCarthy <[email protected]> Co-authored-by: Conor McCarthy <[email protected]> Co-authored-by: andrewmorrison1 <[email protected]> Co-authored-by: Conor McCarthy <[email protected]> Co-authored-by: Conor McCarthy <[email protected]> Co-authored-by: andrewmorrison1 <[email protected]> Co-authored-by: Stephen Taylor <[email protected]>
- Loading branch information