With thousands of channels out there, many of which have dual audio, a way to translate epg data has been on my wish list for a very long time.
Finally i have found a way to do so using argos-translate, see https://github.com/argosopente…nslate?tab=readme-ov-file
Argos translate is good in that it works offline, from a downloaded "dictionary" and more importantly is free with no API call limits.
All of the other translators i am aware of cost a fortune when your trying to translate large files, for context.... just sly Germany egp has over 300 000 lines of data, which is billions of character's.
Below is the script i am currently using. Obviously you will need to install all the required module's and argos-translate, see it's readme file for detail.
I have tried several ways of speeding it up while still maintaining the correct XML format, all my attempts to speed it up any further, have failed.
I am not sure if argos-translate supports multi threading, but if it dose i cant get it to work correctly. Half the issue is that it doesn't really support XML structure either.
From what i have read a decent GPU may be the fastest way to do things, again im limited to my lappie with its built in AMD chipset graphics.
So here is the issue ......it takes over 4 hours to translate the 5 files in the script - and that is only for one epg provider!! ![]()
I could refine the data to only what i require, mostly sport, movies and documentary channels, but ultimately i would still want all the data, for the all the sats.... and that would still take forever.
So i am after help in refining the script to make it faster - while still maintaining good XML structure....
And / Or .... other users to also translate EPG data, then if we share the translations - we could all get more EPG sources translated to English.
Anyways I'm probably waffling now, script below:
#!/usr/bin/env python3
import os
import requests
import lzma
import argostranslate.package
import argostranslate.translate
from lxml import etree
import time
from tqdm import tqdm
# Define the directory for downloading and saving files
directory = "/tmp"
os.makedirs(directory, exist_ok=True)
# List of URLs to download
urls = [
"http://epgspot.com/rytec_epg/rytecDE_Basic.xz",
"http://epgspot.com/rytec_epg/rytecDE_OtherMovies.xz",
"http://epgspot.com/rytec_epg/rytecCH_Basic.xz",
"http://epgspot.com/rytec_epg/rytecDE_Common.xz",
"http://epgspot.com/rytec_epg/rytecDE_SportMovies.xz"
]
def process_file(url):
"""Download, extract, translate, and compress a file from a given URL."""
local_filename = os.path.join(directory, os.path.basename(url))
# Download the file
download_file(url, local_filename)
# Extract the downloaded file
extracted_file = extract_file(local_filename)
# Translate the extracted XML file
translated_file = translate_xml(extracted_file)
# Compress the translated XML file
compressed_file = compress_file(translated_file)
def download_file(url, local_filename):
"""Download a file from the specified URL."""
response = requests.get(url)
response.raise_for_status() # Raise an error for bad responses
with open(local_filename, 'wb') as f:
f.write(response.content) # Write the content to a local file
def extract_file(local_filename):
"""Extract a .xz file to its uncompressed form."""
with lzma.open(local_filename) as f:
output_filename = local_filename.replace('.xz', '')
with open(output_filename, 'wb') as out_file:
out_file.write(f.read()) # Write the decompressed content
return output_filename
def translate_text(text, translation):
"""Translate a single text string using the provided translation object."""
return translation.translate(text)
def translate_xml(file_path):
"""Translate the text within an XML file."""
from_code = "de" # Source language code (German)
to_code = "en" # Target language code (English)
# Get installed languages and set up the translation
installed_languages = argostranslate.translate.get_installed_languages()
from_lang = next((lang for lang in installed_languages if lang.code == from_code), None)
to_lang = next((lang for lang in installed_languages if lang.code == to_code), None)
if from_lang is None or to_lang is None:
print("Error: Specified languages are not installed.")
return None
underlying_translation = from_lang.get_translation(to_lang) # Set the translation object
print(f"Translating XML file: {file_path}")
# Parse the XML file
tree = etree.parse(file_path)
root = tree.getroot()
translation_cache = {} # Cache for translated strings
translation_count = 0
max_translations = 9999000 # Limit for translations
start_time = time.time() # Record the start time
total_elements = sum(1 for _ in root.iter()) # Count total XML elements
with tqdm(total=total_elements, desc="Translating", unit="element") as pbar:
for elem in root.iter():
if translation_count >= max_translations:
break # Stop if max translations reached
# Translate text if present
if elem.text and elem.text.strip():
original_text = elem.text.strip()
if original_text not in translation_cache:
translated_text = translate_text(original_text, underlying_translation)
translation_cache[original_text] = translated_text # Cache the translation
else:
translated_text = translation_cache[original_text]
elem.text = translated_text
translation_count += 1 # Increment the translation count
# Handle tail text (text following an element)
if elem.tail and elem.tail.strip():
original_tail = elem.tail.strip()
if original_tail not in translation_cache:
translated_tail = translate_text(original_tail, underlying_translation)
translation_cache[original_tail] = translated_tail
else:
translated_tail = translation_cache[original_tail]
elem.tail = translated_tail
translation_count += 1
pbar.update(1) # Update the progress bar
# Write the translated XML to a new file
translated_file_path = file_path.replace("rytec", "translated")
tree.write(translated_file_path, pretty_print=True, xml_declaration=True, encoding='utf-8')
end_time = time.time() # Record the end time
processing_time = end_time - start_time # Calculate processing time
print(f"Successfully translated XML file to '{translated_file_path}'.")
print(f"Total processing time: {processing_time:.2f} seconds.")
return translated_file_path
def compress_file(file_path):
"""Compress a file using LZMA compression."""
compressed_filename = file_path + ".xz"
with open(file_path, 'rb') as f:
with lzma.open(compressed_filename, 'wb') as out_file:
out_file.write(f.read()) # Write the content to the compressed file
print(f"Compressed '{file_path}' to '{compressed_filename}'.")
return compressed_filename
def main():
"""Main function to process each URL."""
for url in urls:
process_file(url) # Process each file one by one
if __name__ == "__main__":
main() # Execute the main function
Display More
As i said earlier, it takes ages to translate the data, here is my run details from earlier today:
jason@Jason-Lappie:~/epgstuff$ python3 translator.py
Translating XML file: /tmp/rytecAT_Basic
Translating rytecAT_Basic: 100%|█████████████████████████████████████████████████████████████████████████████████| 30436/30436 [19:09<00:00, 26.47element/s]
Successfully translated XML file to '/tmp/translatedAT_Basic'.
Total processing time: 1150.00 seconds.
Compressed '/tmp/translatedAT_Basic' to '/tmp/translatedAT_Basic.xz'.
Translating XML file: /tmp/rytecDE_Basic
Translating rytecDE_Basic: 100%|███████████████████████████████████████████████████████████████████████████████| 87546/87546 [1:32:41<00:00, 15.74element/s]
Successfully translated XML file to '/tmp/translatedDE_Basic'.
Total processing time: 5562.21 seconds.
Compressed '/tmp/translatedDE_Basic' to '/tmp/translatedDE_Basic.xz'.
Translating XML file: /tmp/rytecCH_Basic
Translating rytecCH_Basic: 100%|█████████████████████████████████████████████████████████████████████████████████| 62480/62480 [32:49<00:00, 31.73element/s]
Successfully translated XML file to '/tmp/translatedCH_Basic'.
Total processing time: 1969.37 seconds.
Compressed '/tmp/translatedCH_Basic' to '/tmp/translatedCH_Basic.xz'.
Translating XML file: /tmp/rytecDE_Common
Translating rytecDE_Common: 100%|██████████████████████████████████████████████████████████████████████████████| 72050/72050 [1:18:40<00:00, 15.26element/s]
Successfully translated XML file to '/tmp/translatedDE_Common'.
Total processing time: 4720.23 seconds.
Compressed '/tmp/translatedDE_Common' to '/tmp/translatedDE_Common.xz'.
Translating XML file: /tmp/rytecDE_SportMovies
Translating rytecDE_SportMovies: 100%|███████████████████████████████████████████████████████████████████████████| 39091/39091 [38:32<00:00, 16.90element/s]
Successfully translated XML file to '/tmp/translatedDE_SportMovies'.
Total processing time: 2312.80 seconds.
Compressed '/tmp/translatedDE_SportMovies' to '/tmp/translatedDE_SportMovies.xz'.
jason@Jason-Lappie:~/epgstuff$
Display More
Here is the source as i have added it to /etc/epgimport
For now i am just storing the files locally.
<?xml version="1.0" encoding="utf-8"?>
<sources>
<!--
translated epg data test"
-->
<mappings>
<channel name="rytec.channels.xml.xz">
<url>http://epgspot.com/rytec_epg/rytec.channels.xml.xz</url>
</channel>
</mappings>
<sourcecat sourcecatname="Translated DE to EN @19.2east XMLTV">
<source type="gen_xmltv" channels="rytec.channels.xml.xz">
<description>Translated Sly DE - pt1</description>
<url>/media/hdd/translated/translatedDE_SportMovies.xz</url>
</source>
<source type="gen_xmltv" channels="rytec.channels.xml.xz">
<description>Translated Sly DE - pt2</description>
<url>/media/hdd/translated/translatedDE_Common.xz</url>
</source>
<source type="gen_xmltv" channels="rytec.channels.xml.xz">
<description>Translated Sly DE - pt3</description>
<url>/media/hdd/translated/translatedDE_Basic.xz</url>
</source>
<source type="gen_xmltv" channels="rytec.channels.xml.xz">
<description>Translated Sly DE - pt4</description>
<url>/media/hdd/translated/translatedCH_Basic.xz</url>
</source>
<source type="gen_xmltv" channels="rytec.channels.xml.xz">
<description>Translated Sly DE - pt5</description>
<url>/media/hdd/translated/translatedAT_Basic.xz</url>
</source>
</sourcecat>
</sources>
Display More
Ultimately you end up with this:
The translated data is not perfect, but it is mostly fine..... besides, my broken English is way better than my broken German.
Anyways i hope there is some interest in this, it would be a load easier for everyone if a few peeps get on board and help.
Let me know your thoughts...... any help with it would be appreciated ![]()
