Compare commits

...

8 Commits

Author SHA1 Message Date
Денис Будяк
b70cc22f12 Завёл дневник поиска и список изученных файлов 2023-11-21 00:15:47 +03:00
Денис Будяк
c53605582a Самозапускаемость файла распечатай-лексемы-из-файла.py 2023-11-21 00:10:06 +03:00
Денис Будяк
2c35464854 Проблема с минифицированными js при их открытии как cp1251 - придумал обходной путь, вроде работает 2023-11-20 23:47:14 +03:00
Денис Будяк
e7eb800c83 Начал искать русские имена, добавил выборку за вычетом тех, где уже
установлено (не совсем точно) отсутствие кириллицы
2023-11-20 23:31:54 +03:00
Денис Будяк
ec7ba22b64 отладочная утилита распечатай-лексемы-из-файла.py 2023-11-20 23:24:55 +03:00
Денис Будяк
01c2e243bd добавил .gitignore 2023-11-20 23:02:04 +03:00
Денис Будяк
113176257f Вроде стало похоже на правду 2023-11-20 23:01:54 +03:00
Денис Будяк
d091cc4335 Первая версия скрипта поиска русских имён,
благодарность Глебу NNN
2023-11-20 22:29:46 +03:00
9 changed files with 42651 additions and 1 deletions

3
.gitignore vendored Normal file
View File

@ -0,0 +1,3 @@
иривк/ЯзыкиКоторыеМыНеЗаказывали.txt
иривк/log.txt
иривк/cloned_repos

17
.vscode/launch.json vendored Normal file
View File

@ -0,0 +1,17 @@
{
// Используйте IntelliSense, чтобы узнать о возможных атрибутах.
// Наведите указатель мыши, чтобы просмотреть описания существующих атрибутов.
// Для получения дополнительной информации посетите: https://go.microsoft.com/fwlink/?linkid=830387
"version": "0.2.0",
"configurations": [
{
"name": "Python: ищи-русские-имена-в-коде.py",
"type": "python",
"request": "launch",
"program": "ищи-русские-имена-в-коде.py",
"console": "integratedTerminal",
"cwd": "${workspaceFolder}/иривк",
"justMyCode": true
}
]
}

View File

@ -1,4 +1,4 @@
Список open-source проектов, где используется русский язык в идентификаторах, сообщениях коммитов и/или сопроводительной документации. Если хотите добавить свой проект в этот список, заведите баг. Мы можем отказать любому проекту во включении в список по своему усмотрению.
Список open-source проектов, где используется русский язык в идентификаторах, сообщениях коммитов и/или сопроводительной документации. Если хотите добавить свой проект в этот список, заведите баг.
Около 40000 репозиториев, найденных с помощью запросов к бигдате, можно найти тут:
@ -47,3 +47,5 @@ https://github.com/z-techno/cryptopro - Адаптер для родной Кр
https://github.com/vataninandrey/raox-models примеря для языка моделирования RAO; КОД, коммиты, комментарии
Также имеется скрипт в директории иривк, который по списку URL-ов репозитория пытается искать в них русские имена.
См. подробности в исходном тексте.

70
иривк/repos.txt Normal file
View File

@ -0,0 +1,70 @@
https://github.com/budden/dlist
https://github.com/DonCuponesInternet/rails_admin
https://github.com/aravindgd/rails_admin
https://github.com/athompson11/Soteria-v3
https://github.com/InsaneHyena/tgstation
https://github.com/Boggart/-tg-station
https://github.com/HippieStation/HippieStation13
https://github.com/ExcessiveUseOfCobblestone/tgstation
https://github.com/Judopay/Judo-Ruby
https://github.com/prgTW/monolog
https://github.com/bevis-ui/bevis-and-bt-speech
https://github.com/gdamjan/vezilka
https://github.com/LopatkinEvgeniy/robot-parser
https://github.com/kimshrier/elixir
https://github.com/Suomaa/FTT
https://github.com/dle-modules/DLE-Charset-Converter
https://github.com/Felix0830/gitextensions
https://github.com/thexide/JavaScript
https://github.com/istrel/basisjs
https://github.com/p2rv/Univer
https://github.com/ruLait/wp-steam-shortcode
https://github.com/Alexponomarev7/plotter
https://github.com/dosvid/landing
https://github.com/vlascoder/otrs
https://github.com/krf/kdevplatform
https://github.com/Nukkit/Nukkit
https://github.com/vsuh/1S_unloads
https://github.com/alshalan/Mobile-OpenVPN
https://github.com/LionZXY/HackathonBMSTU
https://github.com/splitice/Elastica
https://github.com/lolosoft/CashBook
https://github.com/fredformout/InstagramKit
https://github.com/nin-jin/pms-jin
https://github.com/mcepl/youtube-dl
https://github.com/FreeZbe/ACE3
https://github.com/nikolauska/ACE3
https://github.com/ddiachkov/chrno_audit
https://github.com/pershoot/vision-2635
https://github.com/byakatat/selenium-training
https://github.com/Flexberry/ember-flexberry-designer
https://github.com/otavioarc/freeCodeCamp
https://github.com/anketolog/AnketologClient-php
https://github.com/AKosterin/akosterin.github.io
https://github.com/fandrej/glonassd
https://github.com/Scorpibear/chegura
https://github.com/mentatDemon/TOPP_TC
https://github.com/joncol/jcon
https://github.com/kerneldevs/caf-kernel
https://github.com/nasser-embedded/linux
https://github.com/fmaker/kernel_msm
https://github.com/galaxys-cm7miui-kernel/ICS-kernel-SGS
https://github.com/coolya/android_kernel_samsung_msm
https://github.com/pacificIT/linux-2.6.36
https://github.com/DerTeufel/cm7
https://github.com/SergOmarov/Hight-level-library-for-Lua
https://github.com/EKOsh/TeleMonBot
https://github.com/expdevelop/d812
https://github.com/ms301/TelegraphAPI
https://github.com/johnner/tran
https://github.com/esclkm/pagemasseditor
https://github.com/EvercodeLab/EvercodeHipchatMonologBundle
https://github.com/JeffPyeBrook/WP-e-Commerce
https://github.com/ticketmaster-api/ticketmaster-api.github.io
https://github.com/dreikanter/boodka
https://github.com/daveloyall/urbit
https://github.com/LK4D4/criu
https://github.com/rbabichev/Astrafit
https://github.com/mishakos/InsuranceSystem.Library
https://github.com/vapkarian/soccer-analyzer
https://github.com/TrayEdge/FloatingActionButton

View File

@ -0,0 +1,9 @@
2021-11-20:
https://github.com/kenaku/style - Русский язык был найден в этом репозитории: ['cloned_repos/kenaku/style/node_modules/browser-sync/node_modules/browser-sync-ui/node_modules/weinre/web/client/nls/English.lproj/localizedStrings.js', 'cloned_repos/kenaku/style/node_modules/gulp-svgmin/node_modules/svgo/plugins/transformsWithOnePath.js']
первый файл не смотрел, второй - опечатка (буква "С" посреди английского имени)
https://github.com/jumper423/yii2-vk - Русский язык был найден в этом репозитории: ['cloned_repos/jumper423/yii2-vk/VK.php'] - опечатка

View File

@ -0,0 +1,353 @@
2023-11-20:
https://github.com/CodersGit/IssueTracker
https://github.com/devtype-blogspot-com/CPP-Examples
https://github.com/flashrom/flashrom
https://github.com/arasuarun/shogun
https://github.com/sanuj/shogun
https://github.com/tomash/spree
https://github.com/abhishekjain16/spree
https://github.com/NerdsvilleCEO/spree
https://github.com/zaeznet/spree
https://github.com/Nevensoft/spree
https://github.com/agient/agientstorefront
https://github.com/imella/spree
https://github.com/ntb-ch/buildroot
https://github.com/robvogelaar/buildroot
https://github.com/synopsys-usb/buildroot
https://github.com/deorerohan/motionpie
https://github.com/mwksoul/buildroot-rpiemu
https://github.com/dornerworks/buildroot
https://github.com/felfert/tdesktop
https://github.com/schubergphilis/Seccubus
https://github.com/johnnyslt/OLD_android_kernel_shooter
https://github.com/Dm84/xogame.symfony
https://github.com/mahata/build-web-application-with-golang
https://github.com/stonegithubs/build-web-application-with-golang
https://github.com/sky54521/symphony
https://github.com/billho/symphony
https://github.com/tentative13/BusinessCards2_0
https://github.com/dc-artem/python-behave
https://github.com/suenot/d3js-maps-uk-custom
https://github.com/shurius/selenium-py-training-yan
https://github.com/drnextgis/QGIS
https://github.com/boundlessgeo/QGIS
https://github.com/Brainiq7/Ananse
https://github.com/hxddh/youtube-dl
https://github.com/pboonstoppel/linux-3.1-nv-rel15r7-cpuquiet
https://github.com/dscho/dovecot
https://github.com/simon-db/django-cked
https://github.com/GNOME/orca
https://github.com/rdnz/JavaScript-Garden
https://github.com/kshmirko/libim7-py3
https://github.com/jtanguy/talks
https://github.com/varya/func-2015
https://github.com/lxde/lxqt-panel
https://github.com/krichter722/coreutils
https://github.com/Hates/spree_wishlist
https://github.com/wxstars/libphutil
https://github.com/bgn9000/ShunS4
https://github.com/Comcast/rack
https://github.com/jodosha/rack
https://github.com/fabianrbz/rack
https://github.com/phusband/PiaNO
https://github.com/CodersGit/IssueTracker
https://github.com/YSRKEN/orikou_hisyo
https://github.com/BRUTALISM/BRU-3
https://github.com/ledermann/unread
https://github.com/hubgit/unoconv
https://github.com/dobromiraboycheva/JS-UI-and-DOM-Monkey-Gland
https://github.com/bmitch/Faker
https://github.com/kkiernan/Faker
https://github.com/vortexwolf/2ch-Browser
https://github.com/merhalak/eltech-computer-graphics-lab5
https://github.com/dathbezumniy/car_rental_project
https://github.com/kenaku/style
https://github.com/BridgeAR/jshint
https://github.com/fairfieldt/jshint
https://github.com/Jeremy017/jshint
https://github.com/louwers/jshint
https://github.com/victor-gonzalez/AliPhysics
https://github.com/xmhubj/selenium
https://github.com/quoideneuf/selenium
https://github.com/mach6/selenium
https://github.com/joshuaduffy/selenium
https://github.com/shpits205/GeekWork6
https://github.com/c0b/docker-erlang-otp
https://github.com/iTALC/italc
https://github.com/ozodrukh/RippleDrawable
https://github.com/mweimerskirch/symfony
https://github.com/torinaki/symfony
https://github.com/javiereguiluz/symfony
https://github.com/geoffrey-brier/symfony
https://github.com/hiddewie/symfony
https://github.com/enumag/symfony
https://github.com/fabpot/symfony
https://github.com/WhiteEagle88/symfony
https://github.com/evancauwenberg/symfony
https://github.com/mathroc/symfony
https://github.com/chris001/SuiteCRM
https://github.com/vishmehra/SuiteCRM
https://github.com/camikazegreen/drupal
https://github.com/JordanNavratil/jaimeskitchen
https://github.com/brennascurlock/Glenview-Dental
https://github.com/abdcon02/fansite_drupal
https://github.com/BabaYaga64/PetModuleDrupal
https://github.com/schultetwin/ejrst
https://github.com/RepEquity/distroteq
https://github.com/ferjflores/canaan
https://github.com/tfmertz/drupal-coffee
https://github.com/loyison/drupal_module
https://github.com/geoff-winner/Drupal1_SuperBaseball
https://github.com/abdcon02/Pet_CustomModules_Drupal
https://github.com/dlinhle/TEDxPerth
https://github.com/IriiAlfaro/Tienda-Proyecto
https://github.com/chitraphp/shift_cipher_assessment
https://github.com/dha-test/second
https://github.com/jfranti/JCVDSB
https://github.com/kdv24/page_one
https://github.com/BabaYaga64/CameronsCoffeeShop
https://github.com/PierreLgrd/drupal-pierre
https://github.com/ozin7/test
https://github.com/joshkoenig/golden-god-demo
https://github.com/rolfington/rolfington06
https://github.com/abdcon02/cameronsCoffee_task_drupal
https://github.com/chitraphp/drupal_assessment_tests
https://github.com/kdv24/kportfolio
https://github.com/ninthlink/userbase
https://github.com/Platron/signature
https://github.com/i-suhar/generator-sp
https://github.com/njmube/erpnext
https://github.com/elementary/website
https://github.com/withinsoft/znc
https://github.com/Sidnioulz/SandboxExo
https://github.com/stps/libass
https://github.com/mesokeen/My-Personal-Portfolio
https://github.com/asteven/beets
https://github.com/accesso/beets
https://github.com/bj-yinyan/beets
https://github.com/buglabs/oe-buglabs
https://github.com/sdgdsffdsfff/phpdaemon
https://github.com/sedoy1/posh_rosreestr
https://github.com/lynxis/trashrom
https://github.com/ThisIsBrain/mycv
https://github.com/alexanderby/darkreader
https://github.com/SoftFx/TTFixClient
https://github.com/zahasoft/logic
https://github.com/esmakula/moskito
https://github.com/Ezaki113/vk-callback
https://github.com/scientistnik/start_rvec
https://github.com/jmerkow/ITK
https://github.com/BRAINSia/ITK
https://github.com/sasezaki/zf2
https://github.com/FundingGates/activeadmin
https://github.com/deivid-rodriguez/activeadmin
https://github.com/vraravam/activeadmin
https://github.com/yeti-switch/active_admin
https://github.com/siutin/activeadmin
https://github.com/cntd/sidekiq-overlord
https://github.com/irnc/be-be-x-old
https://github.com/zhaochao/fuel-library
https://github.com/ckuwanoe/the_role_bootstrap3_ui
https://github.com/maibatsu/andyorlov
https://github.com/dopeghoti/daedalus
https://github.com/LLA-Gaming/Ice-Station
https://github.com/qexyorg/webMCR-Gallery
https://github.com/delix/EDUCacheSim2
https://github.com/lbxl2345/glibc
https://github.com/Handzhiyski/po-homework
https://github.com/LilianVachkov/po-homework
https://github.com/ponkotuy/MyFleetGirls
https://github.com/tss/qemu
https://github.com/zuban32/qemu-gsoc
https://github.com/bkoppelmann/qemu-tricore
https://github.com/poulacou/libmodbus
https://github.com/Javran/kc3-translations
https://github.com/AppKiv/TCRM
https://github.com/iris4acure/cgm-remote-monitor
https://github.com/adipanait/cgm-remote-monitor
https://github.com/skubigolf/cgm-remote-monitor
https://github.com/WholeEnchilada/cgm-remote-monitor
https://github.com/dbogardaustin/cgm-remote-monitor
https://github.com/editdata7/cgm-remote-monitor
https://github.com/TerriV/cgm-remote-monitor
https://github.com/iwalktheline86/cgm-remote-monitor
https://github.com/mjsell/cgm-remote-monitor
https://github.com/ryandexcom/cgm-remote-monitor
https://github.com/ElouisaB23/cgm-remote-monitor
https://github.com/silvalized/cgm-remote-monitor
https://github.com/JacknSundrop/cgm-remote-monitor
https://github.com/nomareel/cgm-remote-monitor
https://github.com/Kraey/cgm-remote-monitor
https://github.com/Elysespump/cgm-remote-monitor
https://github.com/HidroRaul/cgm-remote-monitor
https://github.com/isleypie/cgm-remote-monitor
https://github.com/Wasson101/cgm-remote-monitor
https://github.com/kamrausch/cgm-remote-monitor
https://github.com/drex304/cgm-remote-monitor
https://github.com/hannahggdb/cgm-remote-monitor
https://github.com/rishii7/vscode
https://github.com/Zalastax/vscode
https://github.com/mmyagaa/Timetable
https://github.com/kib357/CheckInTmnSite
https://github.com/renothing/kanboard
https://github.com/sutra/openid-server
https://github.com/BrandyMint/votes
https://github.com/lomaster1/CitiesWar
https://github.com/nikopartanen/nikopartanen.github.io
https://github.com/darktable-org/rawspeed
https://github.com/dustinrb/mezzanine
https://github.com/promil23/mezzanine
https://github.com/concrete5/concrete5
https://github.com/mwmaleks/node-openssl-p12
https://github.com/uynil/rainloop-webmail
https://github.com/LordAro/OpenTTD
https://github.com/matthijskooijman/openttd
https://github.com/irnc/federalism
https://github.com/ZJU-CC98/Forum
https://github.com/gloryleague/avahi
https://github.com/TykTechnologies/tyk
https://github.com/mothership-ec/NodeBB
https://github.com/gvms/gvms
https://github.com/80LevelElf/PersonalBlog
https://github.com/Zalmoxisus/momentjs-datetimepicker
https://github.com/jinghm318/datetimepicker
https://github.com/dedepete/Luncher
https://github.com/irnc/active
https://github.com/upderground/nr
https://github.com/majaradichevich/majaradichevich.github.io
https://github.com/shiver/i3lock
https://github.com/astephan/MyFacts
https://github.com/m-box/smartcity.melitopol
https://github.com/devmru/ClashOfClansBot
https://github.com/CawaKharkov/golang-test-task
https://github.com/DCeres/InsideDrag
https://github.com/mouhb/cjdns
https://github.com/philipl/gvfs
https://github.com/mosoft521/dropwizard
https://github.com/alexusmai/ruslug
https://github.com/kestereverts/openttd-ijpack
https://github.com/R4wizard/OpenTTD-patches
https://github.com/benjeffery/openttd
https://github.com/prepor/pinbo
https://github.com/jbygdell/subsurface
https://github.com/placiano/NBKernel_Lollipop
https://github.com/Dwightun/practice
https://github.com/MichaelKuzyk/toptour
https://github.com/shabuninil/fileup
https://github.com/Vnr/obd-search
https://github.com/svisystem/atk4
https://github.com/noonehos/router
https://github.com/TelerikAcademy/JavaScript-Fundamentals
https://github.com/maxvas/qjs
https://github.com/blackmamba85/pica
https://github.com/microcom/odoo
https://github.com/anisimow/yacassa
https://github.com/ZaprudnovAA/Radiorecord
https://github.com/aplanas/rally
https://github.com/sleshJdev/Fireworks.NET
https://github.com/urandom/gearshift
https://github.com/ChaosPower/PHPoAuthLib
https://github.com/elliotchance/PHPoAuthLib
https://github.com/defragmentator/PHPoAuthLib
https://github.com/tsdl2013/actor-platform
https://github.com/mxw0417/actor-platform
https://github.com/elfmz/far2l
https://github.com/inqueez/students-web-calendar
https://github.com/chriso/validator.js
https://github.com/EndyKaufman/django-postgres-angularjs-blog
https://github.com/WasimAhmad/Serenity
https://github.com/splav/falt_algo_764
https://github.com/rsppv/moodle-clicker
https://github.com/sonyarianto/WebFundamentals
https://github.com/Skorezore/Gaem
https://github.com/solo-framework/solo-formrestore
https://github.com/brabo/libopencm3
https://github.com/daniel-thompson/libopencm3
https://github.com/CyanogenMod/android_external_bluetooth_glib
https://github.com/igoldin74/java_for_testers
https://github.com/Norpadon/vk_async
https://github.com/MirantisWorkloadMobility/CloudFerry
https://github.com/archyufa/CloudFerry
https://github.com/askl56/homebrew-cask
https://github.com/cprecioso/homebrew-cask
https://github.com/bgandon/homebrew-cask
https://github.com/lalyos/homebrew-cask
https://github.com/hristozov/homebrew-cask
https://github.com/scottsuch/homebrew-cask
https://github.com/RJHsiao/homebrew-cask
https://github.com/jpodlech/homebrew-cask
https://github.com/griddynamics/jagger
https://github.com/Garykom/Print2FR
https://github.com/0x7F800000/gcc
https://github.com/schivei/gcc
https://github.com/Gd58/gcc
https://github.com/paranoiacblack/gcc
https://github.com/redbrain/gccrs
https://github.com/moshpirit/Conversations
https://github.com/youprofit/scikit-image
https://github.com/jfranklin9000/urbit
https://github.com/MikhailArkhipov/RTVS
https://github.com/angularbrasil/angular.js
https://github.com/jvkops/angular.js
https://github.com/sapphoo/angular.js
https://github.com/zuzusik/angular.js
https://github.com/evil-wolf/angular.js
https://github.com/idoo/angular.js
https://github.com/fredyang/angular.js
https://github.com/elton0895/angular.js
https://github.com/oddui/angular.js
https://github.com/langwz/angular.js
https://github.com/jagdeesh109/angular.js
https://github.com/Lucassssss/angular.js
https://github.com/skyl/angular.js
https://github.com/nahakiole/angular.js
https://github.com/lorijlh/angularjs
https://github.com/RubyLouvre/angular.js
https://github.com/Galiats/AngularJs
https://github.com/yar229/WebDavMailRuCloud
https://github.com/Jugolo/PHP-Fusion
https://github.com/dkillebrew/vibe.d
https://github.com/xZise/pywikibot-core
https://github.com/PersianWikipedia/pywikibot-core
https://github.com/Amice13/ukr_stemmer
https://github.com/ronanchilvers/PHPCI
https://github.com/Lechus/PHPCI
https://github.com/yelizariev/mail-addons
https://github.com/mansayk/fastmorph
https://github.com/LibrIT/passhport
https://github.com/luiscauro/mvp
https://github.com/btkostner/mvp
https://github.com/andykarpov/radio-86rk-wxeda
https://github.com/UkrLink/beeftags
https://github.com/UIKit0/rhodes
https://github.com/uabboli/otp
https://github.com/wetek-enigma/enigma2
https://github.com/BlackPole/bp-dvbapp
https://github.com/alexander255/KISS
https://github.com/PhenomRetroShare/RetroShare
https://github.com/alex/changes
https://github.com/Furiten/riichi-api
https://github.com/ksurendra/rtbkit
https://github.com/jumper423/yii2-vk
https://github.com/boiled-sugar/mkvtoolnix
https://github.com/afelicioni/DataTables-Plugins
https://github.com/timchenxiaoyu/Diamond
https://github.com/tuenti/Diamond
https://github.com/disqus/Diamond
https://github.com/hvnsweeting/Diamond
https://github.com/paulbilkis/alternative-exam
https://github.com/espressif/esp-idf
https://github.com/armada-ai/esp-idf
https://github.com/KasaiDot/xvid4psp
https://github.com/mbeloshitsky/mbeloshitsky.github.io
https://github.com/RamirezWillow/brackets
https://github.com/ryanackley/tailor
https://github.com/fernandovm/SQLite.Net-PCL
https://github.com/molinch/SQLite.Net-PCL
https://github.com/dle-modules/xFieldDesign
https://github.com/fginter/docs
https://github.com/trydalcoholic/opencart-materialize

View File

@ -0,0 +1,156 @@
import os
import pathlib
import re
import requests
from subprocess import call
import threading
import pygments
from pygments.lexers import get_lexer_for_filename
import pygments.token
from concurrent.futures import ThreadPoolExecutor
### Проверен на python 3.7.5
### Для установки:
### python3.7 -m pip install requests re
### Перед запуском надо стереть поддиректорию cloned_repos, если она есть, иначе повторная обработка ранее обработанных URL-ов
### будет неверной. Запуск без параметров. Список репозиториев - простой текст, repos.txt.
### Вы должны настроить клонирование репозиториев с гитхаба по URL вида git@github.com:user/repo, т.е. добавить свой
### ssh-ключ на гитхаб.
возможныеПутиК_README_mdВнутриРепозитория = ["/blob/main/README.md", "/blob/master/README.md", "/README.md", "/", ""]
найденныеЯзыкиКоторыеМыНеЗаказывали = []
интересныеЯзыки = ['Ruby', 'VB.net', 'GLSL', 'Perl', 'PHP', 'Python', 'Common Lisp', 'OCaml', 'Java',
'C#', 'JavaScript', 'C', 'C++', 'Prolog', 'Go', 'Rust', 'Scheme', 'Transact-SQL', 'PL-SQL', 'tsql', 'PL/1', 'plsql', 'pli', 'Pascal', 'Delphi', 'Modula-2']
неинтересныеРасширенияФайлов = ['.md','.txt','.html','.xml','.XML','.json','.jpg','.png','.svg','.ttf','.sample']
def НайденЯзыкКоторыйМыНеЗаказывали(lexer_name, url, log, файлДляНезаказанныхЯзыков):
if lexer_name not in найденныеЯзыкиКоторыеМыНеЗаказывали:
найденныеЯзыкиКоторыеМыНеЗаказывали.append(lexer_name)
log.write(f"{url} - Лексер определил язык, который не включён в список разрешённых. {lexer_name} \n")
print(f"{url} - Лексер определил язык, который не включён в список разрешённых. {lexer_name} ")
файлДляНезаказанныхЯзыков.write("%s\n" % lexer_name)
def download_repo(url, log):
httpsPrefix = "https://github.com/"
assert(url.startswith(httpsPrefix))
repo_dir = os.path.join("cloned_repos",*url.split('/')[3:])
gitUrl = url.replace(httpsPrefix, "git@github.com:")
try:
call(['git', 'clone', '--depth', '1', gitUrl, repo_dir])
#git.Repo.clone_from(url, repo_dir)
return repo_dir
except Exception as e:
log.write(f"{gitUrl} - Произошла ошибка при клонирровании репозитория:: {str(e)} \n")
print(f"{gitUrl} - Произошла ошибка при клонирровании репозитория: {str(e)}")
return 0
def analyze_readme(url, log):
for suburl in возможныеПутиК_README_mdВнутриРепозитория:
try:
readme_url = url + suburl
response = requests.get(readme_url)
if response.status_code == 200:
content = response.text
if not(re.search('[а-яА-Я]', content)):
log.write(f"{readme_url} - Нет русских символов в README. \n")
print(f"{readme_url} - Нет русских символов в README.")
return 0
else:
return 1
except Exception as e:
print(f"{readme_url} - Не найден README.")
return 0
def analyze_repo(url, log, файлДляНезаказанныхЯзыков):
try:
print(f"{url} STP загрузуа и анализ README")
res = analyze_readme(url, log)
if not (res == 1):
return
print(f"{url} STP Загрузка репозитория")
res = download_repo(url, log)
if res == 0:
return
print(f"{url} Анализ репозитория")
files_with_russian = []
for root, dirs, files in os.walk(res):
for file in files:
file_path = os.path.join(root, file)
file_ext = os.path.splitext(file_path)[1]
неинтересноеРасширение = False
for расш in неинтересныеРасширенияФайлов:
if file_ext.endswith(расш):
неинтересноеРасширение = True
break
if неинтересноеРасширение:
continue
lexer = None
if file_ext:
try:
lexer = get_lexer_for_filename(file_ext)
except:
print(f"{url}...{file_ext} - Лексер не определил язык. ")
lexer = None
if lexer is None:
continue
if not(lexer.name in интересныеЯзыки):
НайденЯзыкКоторыйМыНеЗаказывали(lexer.name, url, log, файлДляНезаказанныхЯзыков)
if (lexer.name in интересныеЯзыки):
def ИщиРусскиеИменаВТакойКодировке(encoding):
try:
with open(file_path, 'r', encoding=encoding) as f:
content = f.read()
if not re.search('[а-яА-ЯёЁ]',content):
return False
with open(file_path, 'r', encoding=encoding) as f:
лексемы = pygments.lex(f.read(), lexer)
for token, value in лексемы:
# print(token)
if pygments.token.is_token_subtype(token, pygments.token.Name):
if re.search('[а-яА-ЯёЁ]', value):
# аномалия с cp1251, во всяком случае, в минифицированных файлах
if not (encoding=='cp1251' and value == 'п'):
return True
return False
except:
print(f"{url} - Ошибка при разборе файла.")
return None
успех = ИщиРусскиеИменаВТакойКодировке('utf-8')
if успех is None:
успех = ИщиРусскиеИменаВТакойКодировке('cp1251')
if успех == True:
files_with_russian.append(file_path)
if len(files_with_russian) == 0:
log.write(f"{url} - Не обнаруженно файлов содержащих русские символы. \n")
print(f"{url} - Не обнаруженно файлов содержащих русские символы.")
else:
log.write(f"{url} - Русский язык был найден в этом репозитории: {files_with_russian} \n")
log.flush()
print(f"{url} - Русский язык был найден в этом репозитории: {files_with_russian}")
return
except Exception as e:
log.write(f"{url} - Произошла ошибка: {str(e)} \n")
print(f"{url} - Произошла ошибка: {str(e)}")
log.flush()
def main():
# Чтение ссылок из файла
with open("ЯзыкиКоторыеМыНеЗаказывали.txt", "w") as файлДляНезаказанныхЯзыков:
with open("repos.txt", "r") as file:
urls = file.readlines()
urls = [url.strip() for url in urls]
with open("log.txt", "w") as log:
for url in urls:
analyze_repo(url,log,файлДляНезаказанныхЯзыков)
main()

View File

@ -0,0 +1,40 @@
#!/usr/bin/python3.7
import os
import sys
import pathlib
import re
import requests
from subprocess import call
import threading
import pygments
from pygments.lexers import get_lexer_for_filename
import pygments.token
from concurrent.futures import ThreadPoolExecutor
## Появились ложноположительные срабатывания. Пытаемся распечатать русские лексемы, которые являются
## такими срабатываниями
## Принимает в качестве параметра имя файла. Кодировку можно поменять в исходном тексте
кодировка = 'cp1251'
def main():
file_path = sys.argv[1]
file_ext = os.path.splitext(file_path)[1]
lexer = get_lexer_for_filename(file_ext)
if lexer is None:
print("Не удалось определить лексер")
exit(1)
print("лексер = %s" % lexer.name)
with open(file_path, 'r', encoding=кодировка, errors = 'ignore') as f:
лексемы = pygments.lex(f.read(), lexer)
for token, value in лексемы:
if pygments.token.is_token_subtype(token, pygments.token.Name):
if re.search('[а-яА-ЯёЁ]', value):
print("класс = %s, текст = %s" % (token, value))
main()

File diff suppressed because it is too large Load Diff