datasets代写 – Python语言代写 – 数据集代写
datasets代写

datasets代写 – Python语言代写 – 数据集代写

Assessment

 

 

datasets代写 Text documents, such as tweets, are usually composed of topically coherent text data, which within each topically coherent data, one···

 

Text documents, such as tweets, are usually composed of topically coherent text data, which within each topically coherent data, one would expect that the word usage demonstrates more consistent lexical distributions than that across the data-set. A linear partition of texts into topic segments can be used for text analysis tasks, such as passage retrieval in IR (information retrieval), document summarization, recommender systems, and learning-to-rank methods.

Task 1: Parsing Text Files (%60)  datasets代写

Each text file contains information about the tweets, i.e., “user name”, “user code”, “user description”, “number of followers”, “whether or not the user account is verified”, “date  of the tweet”, and the “tweet text”. Your task is to extract the data from the text file and transform the data into a XML format with the following elements:

1.users: this tag wraps all the users

2.user: this tag wraps all the tweets from a particular user and keeps the meta data for each user such as number of followers, verified or not, user description etc. If a user has  multiple tweets, the meta data of the latest tweet (i.e., the tweet with the most  recent date) must be used.

3.Tweets: wraps all the tweets of a specific user

4.tweet: for each user, this tag represents the text of the user tweet

Please note that datasets代写

As we are dealing with large datasets, the manual checking of outputs is impossible and output files would be processed and marked automatically therefore, any deviation from the XML structure (i.e. task1_sample_output.xml) such as wrong key names which can be caused by different spelling, different upper/lower case, etc., wrong hierarchy, not handling the XML special characters etc.

will result in receiving zero for the output mark. (hint: run your code on the provided sample input. And make sure that your code results in the exact same structure (not necessary content) as the sample output. You can also use web applications such as xmlviewer to better understand the structure of the output. VERY IMPORTANT NOTE: The sample outputs are just for you to understand the structure of the required output and the correctness of their content is not  guaranteed. So Please do not try to reverse engineer the outputs as it will fail to  generate the correct content.

Please note that the re and os packages in Python are the only packages that you are allowed to use for the task 1 of this assessment (e.g., “pandas” is not allowed!). Any other packages that you need to “import” before usage is not allowed.

The output and the documentation will be marked separately in this task, and each carries its own mark.

datasets代写
datasets代写

Output (50%) datasets代写

See sample.xml for detailed information about the output structure. The following must be

performed to complete the assessment.

  • Designing efficient regular expressions in order to extract the data from your textual dataset
  • Storing  and  submitting  the   extracted  data  into  an   XML file,<your_student_number>.xml following the format of sample.xml
  • Explaining your code and your methodology in task1_<your_student_number>.ipynb with all the cells’ outputs.
  • A pdf file, “task1_<your_student_number>.pdf ”. You can first clean all the output in the jupyter notebook task1_<your_student_number>.ipynb and then export it as a pdf file then run all the codes again to make sure your ipynb file has all the outputs. This pdf will be passed to Turnitin for plagiarism check.

Methodology (25%) datasets代写

The report should demonstrate the methodology (including all steps) to achieve the correct results.

Documentation (25%)

The solution to get the output must be explained in a well-formatted report (with appropriate sections and subsections). Please remember that the report must explain both the obtained results. And the approach to produce those results. You need to explain both the designed regular expression. And the approach that you have taken in order to design such an expression.

 

更多其他:代写作业 数学代写 物理代写 生物学代写 程序编程代写 AI代写

合作平台:天才代写 幽灵代  写手招聘  paper代写

发表回复