-
Notifications
You must be signed in to change notification settings - Fork 103
/
CHANGES
142 lines (113 loc) · 5.46 KB
/
CHANGES
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
HdfsCLI
=======
Version 2.0 (2015/20/08)
------------------------
* Add python 3 support (the Kerberos extension's requirements must however be
manually installed).
* Allow specifying relative client roots. These will be assumed relative to the
user's home directory.
* Add several client methods: `makedirs`, `set_times`, `checksum`,
`set_replication`, etc.
* Add `progress` argument to `Client.read`, `Client.upload`, and
`Client.download`. Also add a `chunk_size` argument to the latter to allow
better tracking.
* Add `strict` option to `Client.status` and `Client.content` to perform path
existence checks.
* `Client.write` now can be used as a context manager (returning a file-like
object).
* `Client.read` and `Client.write` can now return file-like objects (supporting
`read` and `write` calls respectively).
* Improve robustness of `KerberosClient`. In particular add the
`max_concurrency` parameter which can be tuned to prevent authentication
errors when too many simultaneous requests are being made. Along with the new
delay parameter, this lets us remove the timeouts in `Client.download` and
`Client.upload`, which both simplify and speed up these functions.
* Add `Config` class which handles all CLI configuration (e.g. aliases and
logging).
* Add `autoload.modules` and `autoload.paths` configuration options.
* Rename alias configuration sections to `ALIAS.alias` (the old format,
`ALIAS_alias` is still supported).
* Add `--verbose` option to CLI, enabling logging at various levels.
* Switch Avro extension to using `fastavro`. This speeds up `AvroWriter` and
`AvroReader` by a significant amount (~5 times faster on a reasonable
connection).
* Add `write` command to Avro CLI.
* Remove CSV support for dataframe extension.
Breaking changes:
* Change default configuration file path (to `~/.hdfscli.cfg`).
* Change location of `default.alias` option in configuration file (from command
specific section to `global`).
* Add `session` argument to `Client` and remove newly redundant parameters
(e.g. `verify`, `cert`).
* Remove `Client.from_alias` method (delegated to the new `Config` class). The
`Client.from_options` method is now public (renamed from
`Client._from_options`).
* Change default entry point name to `hdfscli` to avoid clashing with Hadoop
HDFS script.
* `Client.delete` now returns a boolean indicating success (rather than failing
if the path was already empty).
* `Client.read` must now be used inside a `with` block. This ensures that
connections are properly closed. The context manager now returns a file like
object (useful for composing with other functions, e.g. to decode Avro).
Setting `chunk_size` to a positive value will make it return a generator
similar to the previous behavior.
* Rename `Client.set_permissions` to `Client.set_permission` to make
`permission` argument uniform across `Client` methods (always singular,
consistent with WebHDFS API).
* `Client.parts` will now throw an error if called on a normal file.
* Make most client attributes private (e.g. `cert`, `timeout`, etc.), except
`url` and `root`.
* Remove `--` prefix from CLI commands. Also simplify CLI to only interactive,
download and upload commands (write and read behavior can be achieved by
passing '-' as local path). The `Client` API changes should make it more
convenient to perform these from a python shell.
* Rename several CLI options (e.g. `--log`, `--force`, `--version`).
* Change meaning of `n_threads` option in `Client.download` and
`Client.upload`. `0` now means one thread per part-file rather than a single
thread.
* Change `Client.walk` to be consistent with `os.walk`. Also change meaning of
`depth` option (`0` being unlimited).
* Add `status` option to `Client.list`, `Client.walk`, and `Client.parts`. By
default these functions now only return the names of the relevant files and
folders.
* Remove `Client.append` method (replaced by `append` keyword argument to
`Client.write`).
* Symbols exported by extensions aren't imported in the main `hdfs` module
anymore. This removes the need for some custom error handling (when
dependency requirements weren't met).
* Remove Bash autocompletion file (for now).
* Remove compatibility layer for entry point configuration (i.e.
`HDFS_ENTRY_POINT` isn't supported anymore).
Version 1.4.0 (2015/07/24)
--------------------------
* Add support for download and upload of arbitrary folders.
* Deprecate `Client.append` (in favor of `append` argument to `Client.write`).
Version 1.1.0 (2015/06/23)
--------------------------
* Rename Avro extension entry point to `hdfs-avro`.
Version 1.0.1 (2015/06/17)
--------------------------
* Added support for Windows.
* Added support for remote filepaths with `=` characters.
Version 0.3.0 (2014/11/14)
--------------------------
* Added `--interactive` command.
Breaking changes:
* Renamed `--info` command to `--list`.
* Made `--interactive` the new default command.
Version 0.2.6 (2014/08/04)
--------------------------
* Added parallelized downloading.
* Added Avro-format reading and writing.
* Added `hdfs.ext.dataframe` extension.
Version 0.2.0 (2014/04/26)
--------------------------
* Added `Client.status` and `Client.content` methods.
* Added callback to `Client.write`.
Breaking changes:
* Removed content from `Client.walk`.
* Simplified CLI. All download and uploads are normalized through standard in,
and standard out.
Version 0.1.0 (2014/03/25)
--------------------------
* Initial release