Python轮子:造假高手——faker(数据生成器)

原文链接:http://www.juzicode.com/python-module-faker

在测试写好的代码时通常需要用到一些测试数据,大量的真实数据有时候很难获取,如果手动制造测试数据又过于繁重无聊,显得不够优雅,今天我们介绍的faker这个轮子可以完美的解决这个问题。faker是一个用于生成各种类型假数据的库,包括名字、地址、个人信息等等。

安装与导入

安装方法还是一如既往地使用pip,导入模块使用import:

pip install faker

import faker

可以通过faker.VERSION查看当前的版本号,当前的版本是25.6.0:

import faker
print('version:', faker.VERSION)

快速入门

先上一个开胃菜,用Faker()创建一个实例后,就可以调用其各种方法产生数据了。比如可以生成人名:

# VX公众号:juzicode/桔子code
# www.juzicode.com
import faker
f = faker.Faker()
for _ in range(5):
    print(f.name())

运行结果:

Steven Willis
Benjamin Hill
Melvin Rojas
Jennifer Murphy
Joshua Campbell

还可以用来生成生日,电话号码,地址等等:

# VX公众号:juzicode/桔子code
# www.juzicode.com
import faker
f = faker.Faker()
for _ in range(3):
    print('------------')
    print(f.date_of_birth())
    print(f.phone_number())
    print(f.address())

运行结果:

------------
1997-09-01
529.482.9021
Unit 8185 Box 6814
DPO AA 37409
------------
2015-09-17
(301)615-7713x0680
7840 Day Harbor Suite 026
East Mathewmouth, VI 97795
------------
1910-03-22
562.480.2124
5117 Smith Garden Apt. 887
Mckeemouth, NY 41801

当然你也可以使用profile()方法一次性生成个人信息,它会返回一个字典:

# VX公众号:juzicode/桔子code
# www.juzicode.com
import faker
f = faker.Faker()
for _ in range(3):
    print('------------')
    print(f.profile()) 

运行结果:

------------
{'job': 'Plant breeder/geneticist', 'company': 'Chavez Group', 'ssn': '074-71-6734', 'residence': '73464 Ellen Oval Suite 221\nJohnstonton, MH 34069', 'current_location': (Decimal('24.686039'), Decimal('-72.858181')), 'blood_group': 'AB-', 'website': ['https://flores.net/'], 'username': 'millernicole', 'name': 'Johnathan Brown', 'sex': 'M', 'address': '758 Margaret Circles\nEast Kimberly, SD 83729', 'mail': 'donald53@hotmail.com', 'birthdate': datetime.date(1957, 12, 26)}
------------
{'job': 'Pharmacologist', 'company': 'Hines-Church', 'ssn': '227-59-1302', 'residence': '8967 Jones Dam Suite 855\nWest Joshua, AL 96476', 'current_location': (Decimal('-17.084763'), Decimal('-50.191502')), 'blood_group': 'B-', 'website': ['http://www.west.com/', 'https://smith-bond.com/', 'https://www.bennett.com/', 'http://www.riddle-henry.com/'], 'username': 'brobinson', 'name': 'Rebecca Wagner', 'sex': 'F', 'address': '16695 Benjamin Stravenue\nEast Ashleyport, RI 03937', 'mail': 'mary26@hotmail.com', 'birthdate': datetime.date(2010, 7, 23)}
------------
{'job': 'Scientist, water quality', 'company': 'Morrison-Smith', 'ssn': '764-95-8711', 'residence': 'Unit 7343 Box 8997\nDPO AE 53882', 'current_location': (Decimal('-45.831201'), Decimal('-108.873691')), 'blood_group': 'O-', 'website': ['http://www.peterson.info/', 'https://www.yu-garcia.com/', 'https://floyd.com/'], 'username': 'kellysabrina', 'name': 'Kevin Farrell', 'sex': 'M', 'address': 'PSC 8064, Box 5841\nAPO AP 42559', 'mail': 'brettobrien@gmail.com', 'birthdate': datetime.date(1975, 2, 17)}

在这个基础上再配合一些excel文件操作的轮子,就可以生成海量的个人数据表格了。

生成本地化数据

上面的例子产生的数据都是英文格式的,faker还支持本地化数据的生成,在创建实例的时候传入本地化名称就可以生成本地化的数据了,比如zh_CN表示简体中文,de表示德语:

# VX公众号:juzicode/桔子code
# www.juzicode.com
import faker
f = faker.Faker('zh_CN') # 简体中文
print(f.profile()) 
print('------------')
f = faker.Faker('de')    # 德语
print(f.profile()) 

运行结果:

{'job': '客户关系经理/主管', 'company': '昂歌信息网络有限公司', 'ssn': '500119198203064590', 'residence': '湖北省兴安盟县西夏张路E座 464036', 'current_location': (Decimal('7.489519'), Decimal('-87.200968')), 'blood_group': 'O-', 'website': ['https://www.br.cn/', 'https://yangtan.cn/', 'https://www.yan.cn/', 'http://www.liaodu.net/'], 'username': 'yongcheng', 'name': '王璐', 'sex': 'M', 'address': '天津市齐齐哈尔市 南长太原路j座 219593', 'mail': 'uma@hotmail.com', 'birthdate': datetime.date(1992, 5, 16)}
------------
{'job': 'Florist', 'company': 'Zorbach Linke GmbH', 'ssn': '091-65-7927', 'residence': 'Mosemanngasse 344\n40555 Ribnitz-Damgarten', 'current_location': (Decimal('-12.8105965'), Decimal('-116.973434')), 'blood_group': 'AB+', 'website': ['https://www.etzold.com/'], 'username': 'ruppertfrancesco', 'name': 'Pavel Tröst', 'sex': 'M', 'address': 'Heide-Marie-Kensy-Straße 52\n47818 Regensburg', 'mail': 'birgitfiebig@googlemail.com', 'birthdate': datetime.date(1922, 2, 15)}

从生成的数据可以看出不止语言种类变成本地的了,证件号码等也更符合本地习惯了。

生成Python类型数据

可以用faker构造Python各种类型的数据,调用方法的字面含义对应了各种数据类型:

# VX公众号:juzicode/桔子code
# www.juzicode.com
import faker
f = faker.Faker()
print(f.pybool())
print(f.pydecimal())
print(f.pyfloat())
print(f.pyint())
print(f.pyiterable())
print(f.pyobject())
print(f.pyset())
print(f.pystr())
print(f.pystr_format())
print(f.pystruct())
print(f.pytuple()) 

运行结果:

False
-7534455542987561356113628832979261.70990856922647895810773411809628052352287239687350573409296713442359054729
427159196.361446
6675
{'pgraham@example.net', Decimal('6872284488449216471957839793155854569545064378722120702945395180848502303424001840580570158125356748.7793062'), 'EXWgzlwSpyWFShgwHNDF', 'fSvHqsnaGWsGhEEJIotB', 'http://davis-mcclure.com/main/blogindex.html', datetime.datetime(1982, 12, 14, 23, 27, 57), 'fpexXngYbuLutAgmWOJE', 3096, 'http://briggs.org/blog/blogpost.jsp', datetime.datetime(1974, 12, 23, 16, 31, 21), 692152127.295639}
None
{1473, datetime.datetime(2012, 2, 20, 4, 35), Decimal('-8691892256320476451330461966694019289630053652.9818474697172363980351686154'), Decimal('-6092666319162567623091417475408257822206629559682585452982.094118774251103965849'), 6213, 'RrnoGIfvmuQBoBdlpIzc', 'https://gutierrez.biz/searchfaq.html', 'cSPqoWxjwTfEGHIKNljK', 'https://phillips-choi.biz/main/tag/wp-contenthome.php', 'uihAHgngmSfDDgHLGIJP', datetime.datetime(1997, 7, 31, 19, 57, 25)}
zayyBBMnNEVRiNfyeajH
k1-1575907r
(['UBTxMuBnQwCobUpxgQMQ', 'UVgQcMuLMHuCvdJPEbZa', 'BoBuRkizqOoGIrcnKdkL', 'IDYDljKXsDfajdMZSHTV', 'nSSiJrTtQfnltbOhvWUv', 'leekathleen@example.net', 262, 'perezpatricia@example.org', Decimal('-533959985214553907880347948693.63324148979663176726156215249'), 'https://hill.com/wp-contentprivacy.html'], {'begin': 'lSqLniOoqxYItDeukapD', 'letter': 'mbBTFWLBAAsdQNhoeUWu', 'production': -3546120093.9101, 'past': datetime.datetime(2016, 11, 6, 18, 15, 1), 'drug': -29630.3663326707, 'present': -51749807124277.3, 'good': 1911, 'southern': 'EaJaouSqDJecbdJZturi', 'none': 'bzEbchIXdkfbPszdbPmB', 'beautiful': 2315}, {'take': {0: datetime.datetime(1979, 2, 9, 20, 6, 32), 1: ['http://blanchard.com/posts/postssearch.asp', 'eyOFbMgJDVBvHJcVQecY', 'melissanoble@example.net'], 2: {0: 9525, 1: 'victoriajones@example.com', 2: [datetime.datetime(2002, 10, 2, 5, 18, 42), 'DiABZjAdOLMOpBLNEUHC']}}, 'son': {1: 7976, 2: [1952, 2295, 'oholmes@example.org'], 3: {1: 'MtcCstAHTIsfirDyPZaA', 2: 'williamsbenjamin@example.org', 3: ['OVbtFiFUfHMvtfpIvzfq', 'CkyKLHWtxZoSMohOlZxD']}}, 'build': {2: 'http://munoz-cunningham.com/listregister.asp', 3: [39445.420028368, datetime.datetime(1982, 5, 15, 21, 40, 31), 7613], 4: {2: 'https://www.wright.com/explorefaq.htm', 3: -747788701919.703, 4: ['BwsyBqsXBLFediJnuadz', 5232]}}, 'major': {3: 'uQywiJzpStzsfVzeydIb', 4: [5285, 'bellmegan@example.com', Decimal('13643880946233419338490611479883994244038422660712924459842967394420958219281339340894537943982755.72183838529930169635960503603484140907480')], 5: {3: 'WGHMzNrGZVnZDTYSSiwv', 4: 1098, 5: [Decimal('581667631758055.948919259694472081221547906118090694694246000805018461205162489186334117'), 55655088534506.9]}}, 'alone': {4: 'floUhEjRbrcSlAgcUpKD', 5: [Decimal('40460646468928113699089077054899257009329837469260688073185695917293247833429683896.714863632410773250047819137693371237432869444562'), Decimal('908359844318342044820741412710171302924614850402079414883000182982603257371834095009905194897618.522803095432216'), 9565], 6: {4: 7628, 5: 'http://www.morales.com/tags/blogauthor.php', 6: ['ywofrJlhNlRygzWaQyFG', 'LCPEXbaOMvySYXOJWTpr']}}, 'best': {5: 'SpDVimYBzlhDJuNqkCWZ', 6: ['http://huang-wilson.org/tagterms.php', 'pLHKBjnndXRwnCHWsOGA', 'lrBxOnsJaIVKUGzxzFmD'], 7: {5: 'nHqumVtLaojDxgWRSRWt', 6: datetime.datetime(1981, 3, 11, 7, 30, 36), 7: [-65287331886129.9, 'mCPAlfJeIDeJPSjXhYJh']}}, 'bed': {6: 603974611145.587, 7: [datetime.datetime(2022, 4, 5, 11, 27, 34), 6993, 'gzDaXfcJCtnKAPAcLFhP'], 8: {6: 'kerjRozaRPCDbZTQDmgp', 7: 'wdKSJclVYlacchXshEed', 8: ['uyonjvlKDYXjQHHQUFdV', 'http://alvarez.net/categorieshomepage.html']}}, 'maintain': {7: 'IGgpKZaLhMvnHemTlGPU', 8: [datetime.datetime(1981, 1, 9, 19, 3, 5), 'tkOvlzaRCBKBtvyyqYNF', 958512.14122953], 9: {7: datetime.datetime(1985, 8, 2, 17, 12, 38), 8: -90490922.8079221, 9: [1267, 1876]}}, 'series': {8: 'scottdanielle@example.net', 9: ['http://ferguson.com/postsabout.html', 'http://www.johnson.net/explore/posts/categorycategory.html', 'gRlIYioszFWXCQrIOxnh'], 10: {8: 'http://cole.com/search/categorieshome.html', 9: 'TYUeKnLxGuUmzRJkkgvS', 10: ['zkVQuXYdxBGvUtFwUoPS', 2525]}}, 'final': {9: datetime.datetime(2002, 4, 27, 18, 23, 7), 10: [8014, Decimal('-321878064168199974704821670217881200123635942527.2989283812924630'), 7818], 11: {9: 3947, 10: datetime.datetime(1973, 11, 6, 4, 2, 39), 11: [746, -233.858433632642]}}})
(-68966651.3545535, 'tSldfjPXVumsygGlregG', Decimal('8655211367549380687362246206540408714340924126865723445408313926282213290900553.7363316869323588360628883398509312389090488'), 'http://www.boyle-neal.info/tag/wp-content/categoryauthor.html', 1321, -21.578299715633, -910.317290443833, 'EoLTtwTlZktihakadESE')

生成时间数据

用faker生成日期、星期、时区等时间数据:

# VX公众号:juzicode/桔子code
# www.juzicode.com
import faker
f = faker.Faker()  
print(f.date())         # 日期
print(f.time())         # 时间
print(f.date_time())    # 日期和时间 
print(f.timezone())     # 时区
print(f.iso8601())      # iso8601时间
print(f.year())         # 年
print(f.month())        # 月
print(f.day_of_month()) # 日 
print(f.day_of_week())  # 星期

运行结果:

1998-09-04
18:56:33
2002-05-29 22:59:23
Africa/Freetown
2019-10-22T21:23:01
1975
03
20
Wednesday

生成颜色数据

用faker生成颜色名称、16进制颜色数值、RGB颜色数值:

# VX公众号:juzicode/桔子code
# www.juzicode.com
import faker
f = faker.Faker()  
print(f.color_name())  
print(f.hex_color())     
print(f.rgb_color())  

运行结果:

LightCyan
#cc958d
21,126,12

命令行工具

faker还提供了命令行工具,直接在命令行中生成数据。下面的例子在命令行生成5条地址信息,其中-l指定本地化语言,-r后面带生成数据的条数:

E:\juzicode\faker造假高手>faker address  -l zh_CN  -r 5  
海南省成都市清浦胡街M座 379873

贵州省雪梅县大兴辛集路z座 521807

江苏省梧州市涪城康路B座 763552

陕西省宁德县沈河吴路m座 763816

吉林省坤县静安姚街L座 838843

你还可以通过-h参数查看帮助解锁到更多的技能:

E:\juzicode\faker造假高手>faker -h
usage: faker [-h] [--version] [-v] [-o output] [-l LOCALE] [-r REPEAT] [-s SEP] [--seed SEED] [-i [INCLUDE ...]] [fake] [fake argument ...]

faker version 25.6.0

positional arguments:
  fake                  name of the fake to generate output for (e.g. profile)
  fake argument         optional arguments to pass to the fake (e.g. the profile fake takes an optional list of comma separated field names as the first argument)

options:
  -h, --help            show this help message and exit
  --version             show program's version number and exit

通过前面的介绍,我们了解了faker的基本用法,本文列出的案例只是faker库的一小部分功能,你还可以用faker生成ip地址、邮箱地址、用户代理等互联网数据,也可以生成文本数据、财务数据、地理数据、文件名称等。

扩展阅读:

  1. Python轮子 – 桔子code (juzicode.com)
  2. https://faker.readthedocs.io/

发表评论

您的电子邮箱地址不会被公开。 必填项已用*标注