Collecting impyla
Downloading https://files.pythonhosted.org/packages/6f/96/92f933cd216f9ff5d7f4ba7e0615a51ad4e3beb31a7de60f7df365378bb9/impyla-0.14.1-py2-none-any.whl (165kB)
100% |████████████████████████████████| 174kB 464kB/s
Collecting bitarray (from impyla)
Downloading https://files.pythonhosted.org/packages/0a/da/9f61d28a20c42b4963334efacfd257c85150ede96d0cd2509b37da69da47/bitarray-0.8.1.tar.gz (46kB)
100% |████████████████████████████████| 51kB 5.1MB/s
Collecting thrift<=0.9.3 (from impyla)
Downloading https://files.pythonhosted.org/packages/ae/58/35e3f0cd290039ff862c2c9d8ae8a76896665d70343d833bdc2f748b8e55/thrift-0.9.3.tar.gz
Requirement already satisfied: six in /usr/lib/python2.7/site-packages (from impyla) (1.11.0)
Installing collected packages: bitarray, thrift, impyla
Running setup.py install for bitarray ... done
Running setup.py install for thrift ... done
Successfully installed bitarray-0.8.1 impyla-0.14.1 thrift-0.9.3
[root@cdh-vm ~]# pip install sqlalchemy
Collecting sqlalchemy
Downloading https://files.pythonhosted.org/packages/c1/c8/392fcd2d01534bc871c65cb964e0b39d59feb777e51649e6eaf00f6377b5/SQLAlchemy-1.2.7.tar.gz (5.6MB)
100% |████████████████████████████████| 5.6MB 721kB/s
Installing collected packages: sqlalchemy
Running setup.py install for sqlalchemy
[cdh-vm.dbaglobe.com:21000] > create table quarters(salesman string,q1 int,q2 int,q3 int,q4 int) row format delimited fields terminated by ',' tblproperties('skip.header.line.count'='1');
Query: create table quarters(salesman string,q1 int,q2 int,q3 int,q4 int) row format delimited fields terminated by ','
Fetched 0 row(s) in 0.50s
[cdh-vm.dbaglobe.com:21000] > load data inpath '/data/quarters.csv' overwrite into table quarters;
Query: load data inpath '/data/quarters.csv' overwrite into table quarters
+----------------------------------------------------------+
| summary |
+----------------------------------------------------------+
| Loaded 1 file(s). Total files in destination location: 1 |
+----------------------------------------------------------+
Fetched 1 row(s) in 2.89s
[cdh-vm.dbaglobe.com:21000] > select * from quarters;
Query: select * from quarters
Query submitted at: 2018-04-25 21:22:18 (Coordinator: http://cdh-vm.dbaglobe.com:25000)
Query progress can be monitored at: http://cdh-vm.dbaglobe.com:25000/query_plan?query_id=77405085e748686d:b3aebdc200000000
+----------+--------+--------+--------+--------+
| salesman | q1 | q2 | q3 | q4 |
+----------+--------+--------+--------+--------+
| Boris | 602908 | 233879 | 354479 | 32704 |
| Bob | 43790 | 514863 | 297151 | 544493 |
| Tommy | 392668 | 113579 | 430882 | 247231 |
| Travis | 834663 | 266785 | 749238 | 570524 |
| Donald | 580935 | 411379 | 110390 | 651572 |
| Ted | 656644 | 70803 | 375948 | 321388 |
| Jeb | 486141 | 600753 | 742716 | 404995 |
| Stacy | 479662 | 742806 | 770712 | 2501 |
| Morgan | 992673 | 879183 | 37945 | 293710 |
+----------+--------+--------+--------+--------+
[donghua@cdh-vm pandas]$ ipython
Python 2.7.5 (default, Aug 4 2017, 00:39:18)
Type "copyright", "credits" or "license" for more information.
IPython 5.5.0 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.
In [1]: from impala.dbapi import connect
In [2]: conn = connect(host='cdh-vm',port=21050,database='test')
In [3]: cur = conn.cursor()
In [4]: cur.execute('show tables')
In [5]: cur.fetchall()
Out[5]:
[('byname_kudu',),
('getrelationinfobyname',),
('getrelationinfobyname_kudu',),
('quarters',),
('t1',),
('t_timestamp',),
('tsdemo',),
('tsstr',),
('vehicles',)]
In [6]: cur.execute('select * from quarters limit 5')
In [8]: cur.fetchall()
Out[8]:
[('Boris', 602908, 233879, 354479, 32704),
('Bob', 43790, 514863, 297151, 544493),
('Tommy', 392668, 113579, 430882, 247231),
('Travis', 834663, 266785, 749238, 570524),
('Donald', 580935, 411379, 110390, 651572)]
In [9]: from impala.util import as_pandas
In [10]: cur.execute('select * from quarters')
In [11]: df = as_pandas(cur)
In [12]: type(df)
Out[12]: pandas.core.frame.DataFrame
In [13]: df
Out[13]:
salesman q1 q2 q3 q4
0 Boris 602908 233879 354479 32704
1 Bob 43790 514863 297151 544493
2 Tommy 392668 113579 430882 247231
3 Travis 834663 266785 749238 570524
4 Donald 580935 411379 110390 651572
5 Ted 656644 70803 375948 321388
6 Jeb 486141 600753 742716 404995
7 Stacy 479662 742806 770712 2501
8 Morgan 992673 879183 37945 293710
In [16]: df2 = df.melt(id_vars='salesman')
In [17]: df2
Out[17]:
salesman variable value
0 Boris q1 602908
1 Bob q1 43790
2 Tommy q1 392668
3 Travis q1 834663
4 Donald q1 580935
5 Ted q1 656644
6 Jeb q1 486141
7 Stacy q1 479662
8 Morgan q1 992673
9 Boris q2 233879
10 Bob q2 514863
11 Tommy q2 113579
12 Travis q2 266785
13 Donald q2 411379
14 Ted q2 70803
15 Jeb q2 600753
16 Stacy q2 742806
17 Morgan q2 879183
18 Boris q3 354479
19 Bob q3 297151
20 Tommy q3 430882
21 Travis q3 749238
22 Donald q3 110390
23 Ted q3 375948
24 Jeb q3 742716
25 Stacy q3 770712
26 Morgan q3 37945
27 Boris q4 32704
28 Bob q4 544493
29 Tommy q4 247231
30 Travis q4 570524
31 Donald q4 651572
32 Ted q4 321388
33 Jeb q4 404995
34 Stacy q4 2501
35 Morgan q4 293710
In [11]: import sqlalchemy
In [12]: import impala.sqlalchemy as i
In [13]: engine=sqlalchemy.create_engine('impala://cdh-vm:21050/test')
In [26]: df2.to_sql(name='quarters_melt',con=engine,index=False,dtype={'salesman':i.STRING,'variable':i.STRING,'value':i.INT})
cdh-vm.dbaglobe.com:21000] > desc quarters_melt;
Query: describe quarters_melt
+----------+--------+---------+
| name | type | comment |
+----------+--------+---------+
| salesman | string | |
| variable | string | |
| value | int | |
+----------+--------+---------+
Fetched 3 row(s) in 0.03s
[cdh-vm.dbaglobe.com:21000] > show create table quarters_melt;
Query: show create table quarters_melt
+--------------------------------------------------------------------------------------+
| result |
+--------------------------------------------------------------------------------------+
| CREATE TABLE test.quarters_melt ( |
| salesman STRING, |
| variable STRING, |
| value INT |
| ) |
| STORED AS TEXTFILE |
| LOCATION 'hdfs://cdh-vm.dbaglobe.com:8020/user/hive/warehouse/test.db/quarters_melt' |
| |
+--------------------------------------------------------------------------------------+
Fetched 1 row(s) in 0.00s
[cdh-vm.dbaglobe.com:21000] >
[cdh-vm.dbaglobe.com:21000] > select * from quarters_melt;
Query: select * from quarters_melt
Query submitted at: 2018-04-25 22:19:06 (Coordinator: http://cdh-vm.dbaglobe.com:25000)
Query progress can be monitored at: http://cdh-vm.dbaglobe.com:25000/query_plan?query_id=c14ae16dfcb301bb:c31d20a200000000
+----------+----------+--------+
| salesman | variable | value |
+----------+----------+--------+
| Donald | q1 | 580935 |
| Jeb | q3 | 742716 |
| Ted | q3 | 375948 |
| Bob | q4 | 544493 |
| Donald | q2 | 411379 |
| Morgan | q2 | 879183 |
| Boris | q4 | 32704 |
| Boris | q2 | 233879 |
| Tommy | q1 | 392668 |
| Jeb | q4 | 404995 |
| Boris | q3 | 354479 |
| Bob | q3 | 297151 |
| Morgan | q3 | 37945 |
| Travis | q2 | 266785 |
| Travis | q4 | 570524 |
| Ted | q2 | 70803 |
| Bob | q1 | 43790 |
| Tommy | q3 | 430882 |
| Stacy | q1 | 479662 |
| Bob | q2 | 514863 |
| Stacy | q4 | 2501 |
| Travis | q3 | 749238 |
| Travis | q1 | 834663 |
| Tommy | q2 | 113579 |
| Jeb | q2 | 600753 |
| Tommy | q4 | 247231 |
| Stacy | q2 | 742806 |
| Donald | q4 | 651572 |
| Morgan | q4 | 293710 |
| Stacy | q3 | 770712 |
| Morgan | q1 | 992673 |
| Jeb | q1 | 486141 |
| Donald | q3 | 110390 |
| Ted | q1 | 656644 |
| Boris | q1 | 602908 |
| Ted | q4 | 321388 |
+----------+----------+--------+
Fetched 36 row(s) in 0.25s
No comments:
Post a Comment